[tei-council] Internationalised domains

Kevin Hawkins kevin.s.hawkins at ultraslavonic.info
Wed Sep 21 12:03:05 EDT 2011


I still don't see why Stuart wouldn't simply put this in the TEI:

<name sameAs=""http://موقع.وزارة-الاتصالات.مصر/">Egyptian Ministry of
 > Communication and Information Technology</name>

<idno>http://موقع.وزارة-الاتصالات.مصر/</idno>

and be done with it.  Generation of percent encoding and Punycode would 
be done by XSLT that produces whatever is used by the delivery system.

--Kevin

On 9/20/2011 11:27 PM, Stuart A. Yeates wrote:
> The situations I am trying to avoid are:
>
> <name sameas="urn:example:%D9%85%D9%88%D9%82%D8%B9.%D9%88%D8%B2%D8%A7%D8%B1%D8%A9-%D8%A7%D9%84%D8%A7%D8%AA%D8%B5%D8%A7%D9%84%D8%A7%D8%AA.%D9%85%D8%B5%D8%B1"
> copyOf="urn:example:xn--4gbrim.xn----rmckbbajlc6dj7bxne2c.xn--wgbh1c"
> corresp="http://xn--4gbrim.xn----rmckbbajlc6dj7bxne2c.xn--wgbh1c/"
> key="http://موقع.وزارة-الاتصالات.مصر/">Egyptian Ministry of
> Communication and Information Technology</name>
>
> and
>
> <idno>http://xn--4gbrim.xn----rmckbbajlc6dj7bxne2c.xn--wgbh1c/</idno>
> vs<idno>http://موقع.وزارة-الاتصالات.مصر/</idno>
>
> etc
>
> URLs are already using punycode in the domain part and percent
> escaping in the file part (at least when they're used in data.pointer
> ), and XML has some pretty strong dependencies on URLs, so neither can
> be prohibited without serious consequences.
>
> Both punycode and percent encoding are mappings of UTF-8, they can be
> converted back and forth with a 1:1 mapping. They are not violations
> of the "use UTF-8" rule.
>
> cheers
> stuart
>
>
> On Wed, Sep 21, 2011 at 10:21 AM, Martin Holmes<mholmes at uvic.ca>  wrote:
>> I agree. I think punycode is a temporary solution to problems with
>> Internet infrastructure and user-agent limitations; if it's to be used,
>> it should be generated during output processing, rather than being part
>> of the core document. TEI XML should be in UTF-8, I think.
>>
>> Cheers,
>> Martin
>>
>> On 11-09-20 03:12 PM, Kevin Hawkins wrote:
>>> I guess what I'm saying is that Punycode is prescribed for use with the
>>> Domain Name System, but our TEI documents might outlive DNS or be used
>>> in a system that uses doesn't use DNS.  After all, even URIs (as
>>> prescribed in RFC 3986) give DNS as an example of a name registry
>>> mechanism, not the only one.
>>>
>>> We tie ourselves to a few external standards (maintained by the W3C)
>>> which may become obsolete at some point, but I'm not sure whether we
>>> should add systems maintained by ICANN to the list.
>>>
>>> --Kevin
>>>
>>> On 9/20/2011 2:31 PM, Stuart A. Yeates wrote:
>>>> Punycode is already required (and happens automatically with modern
>>>> tools and formats) for URIs. View the source of the (UTF-8) web page
>>>> of my example website to see what I mean.
>>>>
>>>> The issue is when people put URIs and in free text fields where the
>>>> tools are unaware that these are URIs and expect them to 'just work'.
>>>>
>>>> cheers
>>>> stuart
>>>>
>>>>
>>>>
>>>> On Wed, Sep 21, 2011 at 1:26 AM, Kevin Hawkins
>>>> <kevin.s.hawkins at ultraslavonic.info>     wrote:
>>>>> I'm not sure about prescribing use of RFC 3492.  This seems to me like
>>>>> prescribing use of US-ASCII with character entity references instead of
>>>>> UTF-8 within XML documents to ensure that we can use our documents with
>>>>> a full range of software toolS -- something that fewer and fewer people
>>>>> support doing.
>>>>>
>>>>> On 9/20/2011 4:49 AM, Stuart A. Yeates wrote:
>>>>>> Currently domain names in TEI can occur in typed fields (such as
>>>>>> data.pointer) or in many other fields where type checking is more
>>>>>> relaxed (or non-existent). I would like to propose the following note
>>>>>> to appear somewhere in the standard (I'm thinking the data.pointer
>>>>>> page, but I'm open to suggestions). The URL in the example is perhaps
>>>>>> the best-known punycode URL (see
>>>>>> http://en.wikipedia.org/wiki/Masr_%28domain_name%29 ), but if Arabic
>>>>>> script causes problems in the publishing process I can probably find a
>>>>>> more Latin-esque one.
>>>>>>
>>>>>> cheers
>>>>>> stuart
>>>>>>
>>>>>> ----
>>>>>>
>>>>>> Internationalised domains containing non-ASCII characters should
>>>>>> always be escaped using RFC 3492 syntax ("punycode") Thus
>>>>>> http://موقع.وزارة-الاتصالات.مصر/ is written
>>>>>> http://xn--4gbrim.xn----rmckbbajlc6dj7bxne2c.xn--wgbh1c/ Such escaping
>>>>>> permits internationalised domains to be used with a full range of
>>>>>> software tools.
>>>>>>
>>>>>> ----
>>>>>> _______________________________________________
>>>>>> tei-council mailing list
>>>>>> tei-council at lists.village.Virginia.EDU
>>>>>> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>>>>>>
>>>>>> PLEASE NOTE: postings to this list are publicly archived
>>>>> _______________________________________________
>>>>> tei-council mailing list
>>>>> tei-council at lists.village.Virginia.EDU
>>>>> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>>>>>
>>>>> PLEASE NOTE: postings to this list are publicly archived
>>> _______________________________________________
>>> tei-council mailing list
>>> tei-council at lists.village.Virginia.EDU
>>> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>>>
>>> PLEASE NOTE: postings to this list are publicly archived
>>
>> --
>> Martin Holmes
>> University of Victoria Humanities Computing and Media Centre
>> (mholmes at uvic.ca)
>> _______________________________________________
>> tei-council mailing list
>> tei-council at lists.village.Virginia.EDU
>> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>>
>> PLEASE NOTE: postings to this list are publicly archived
> _______________________________________________
> tei-council mailing list
> tei-council at lists.village.Virginia.EDU
> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>
> PLEASE NOTE: postings to this list are publicly archived


More information about the tei-council mailing list