[tei-council] Internationalised domains

Stuart A. Yeates syeates at gmail.com
Thu Sep 22 15:25:20 EDT 2011


I was nuder the impression that non-latin-1 wasn't allowed in
data.pointer (and looking through the relevant standards I still can't
see how it is), but such things seem to validate, so I guess you are.

So I'd like to apologize for for my misunderstanding and and withdraw
my suggestion.

cheers
stuart

On Thu, Sep 22, 2011 at 4:03 AM, Kevin Hawkins
<kevin.s.hawkins at ultraslavonic.info> wrote:
> I still don't see why Stuart wouldn't simply put this in the TEI:
>
> <name sameAs=""http://موقع.وزارة-الاتصالات.مصر/">Egyptian Ministry of
>  > Communication and Information Technology</name>
>
> <idno>http://موقع.وزارة-الاتصالات.مصر/</idno>
>
> and be done with it.  Generation of percent encoding and Punycode would
> be done by XSLT that produces whatever is used by the delivery system.
>
> --Kevin
>
> On 9/20/2011 11:27 PM, Stuart A. Yeates wrote:
>> The situations I am trying to avoid are:
>>
>> <name sameas="urn:example:%D9%85%D9%88%D9%82%D8%B9.%D9%88%D8%B2%D8%A7%D8%B1%D8%A9-%D8%A7%D9%84%D8%A7%D8%AA%D8%B5%D8%A7%D9%84%D8%A7%D8%AA.%D9%85%D8%B5%D8%B1"
>> copyOf="urn:example:xn--4gbrim.xn----rmckbbajlc6dj7bxne2c.xn--wgbh1c"
>> corresp="http://xn--4gbrim.xn----rmckbbajlc6dj7bxne2c.xn--wgbh1c/"
>> key="http://موقع.وزارة-الاتصالات.مصر/">Egyptian Ministry of
>> Communication and Information Technology</name>
>>
>> and
>>
>> <idno>http://xn--4gbrim.xn----rmckbbajlc6dj7bxne2c.xn--wgbh1c/</idno>
>> vs<idno>http://موقع.وزارة-الاتصالات.مصر/</idno>
>>
>> etc
>>
>> URLs are already using punycode in the domain part and percent
>> escaping in the file part (at least when they're used in data.pointer
>> ), and XML has some pretty strong dependencies on URLs, so neither can
>> be prohibited without serious consequences.
>>
>> Both punycode and percent encoding are mappings of UTF-8, they can be
>> converted back and forth with a 1:1 mapping. They are not violations
>> of the "use UTF-8" rule.
>>
>> cheers
>> stuart
>>
>>
>> On Wed, Sep 21, 2011 at 10:21 AM, Martin Holmes<mholmes at uvic.ca>  wrote:
>>> I agree. I think punycode is a temporary solution to problems with
>>> Internet infrastructure and user-agent limitations; if it's to be used,
>>> it should be generated during output processing, rather than being part
>>> of the core document. TEI XML should be in UTF-8, I think.
>>>
>>> Cheers,
>>> Martin
>>>
>>> On 11-09-20 03:12 PM, Kevin Hawkins wrote:
>>>> I guess what I'm saying is that Punycode is prescribed for use with the
>>>> Domain Name System, but our TEI documents might outlive DNS or be used
>>>> in a system that uses doesn't use DNS.  After all, even URIs (as
>>>> prescribed in RFC 3986) give DNS as an example of a name registry
>>>> mechanism, not the only one.
>>>>
>>>> We tie ourselves to a few external standards (maintained by the W3C)
>>>> which may become obsolete at some point, but I'm not sure whether we
>>>> should add systems maintained by ICANN to the list.
>>>>
>>>> --Kevin
>>>>
>>>> On 9/20/2011 2:31 PM, Stuart A. Yeates wrote:
>>>>> Punycode is already required (and happens automatically with modern
>>>>> tools and formats) for URIs. View the source of the (UTF-8) web page
>>>>> of my example website to see what I mean.
>>>>>
>>>>> The issue is when people put URIs and in free text fields where the
>>>>> tools are unaware that these are URIs and expect them to 'just work'.
>>>>>
>>>>> cheers
>>>>> stuart
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Sep 21, 2011 at 1:26 AM, Kevin Hawkins
>>>>> <kevin.s.hawkins at ultraslavonic.info>     wrote:
>>>>>> I'm not sure about prescribing use of RFC 3492.  This seems to me like
>>>>>> prescribing use of US-ASCII with character entity references instead of
>>>>>> UTF-8 within XML documents to ensure that we can use our documents with
>>>>>> a full range of software toolS -- something that fewer and fewer people
>>>>>> support doing.
>>>>>>
>>>>>> On 9/20/2011 4:49 AM, Stuart A. Yeates wrote:
>>>>>>> Currently domain names in TEI can occur in typed fields (such as
>>>>>>> data.pointer) or in many other fields where type checking is more
>>>>>>> relaxed (or non-existent). I would like to propose the following note
>>>>>>> to appear somewhere in the standard (I'm thinking the data.pointer
>>>>>>> page, but I'm open to suggestions). The URL in the example is perhaps
>>>>>>> the best-known punycode URL (see
>>>>>>> http://en.wikipedia.org/wiki/Masr_%28domain_name%29 ), but if Arabic
>>>>>>> script causes problems in the publishing process I can probably find a
>>>>>>> more Latin-esque one.
>>>>>>>
>>>>>>> cheers
>>>>>>> stuart
>>>>>>>
>>>>>>> ----
>>>>>>>
>>>>>>> Internationalised domains containing non-ASCII characters should
>>>>>>> always be escaped using RFC 3492 syntax ("punycode") Thus
>>>>>>> http://موقع.وزارة-الاتصالات.مصر/ is written
>>>>>>> http://xn--4gbrim.xn----rmckbbajlc6dj7bxne2c.xn--wgbh1c/ Such escaping
>>>>>>> permits internationalised domains to be used with a full range of
>>>>>>> software tools.
>>>>>>>
>>>>>>> ----
>>>>>>> _______________________________________________
>>>>>>> tei-council mailing list
>>>>>>> tei-council at lists.village.Virginia.EDU
>>>>>>> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>>>>>>>
>>>>>>> PLEASE NOTE: postings to this list are publicly archived
>>>>>> _______________________________________________
>>>>>> tei-council mailing list
>>>>>> tei-council at lists.village.Virginia.EDU
>>>>>> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>>>>>>
>>>>>> PLEASE NOTE: postings to this list are publicly archived
>>>> _______________________________________________
>>>> tei-council mailing list
>>>> tei-council at lists.village.Virginia.EDU
>>>> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>>>>
>>>> PLEASE NOTE: postings to this list are publicly archived
>>>
>>> --
>>> Martin Holmes
>>> University of Victoria Humanities Computing and Media Centre
>>> (mholmes at uvic.ca)
>>> _______________________________________________
>>> tei-council mailing list
>>> tei-council at lists.village.Virginia.EDU
>>> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>>>
>>> PLEASE NOTE: postings to this list are publicly archived
>> _______________________________________________
>> tei-council mailing list
>> tei-council at lists.village.Virginia.EDU
>> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>>
>> PLEASE NOTE: postings to this list are publicly archived
> _______________________________________________
> tei-council mailing list
> tei-council at lists.village.Virginia.EDU
> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>
> PLEASE NOTE: postings to this list are publicly archived


More information about the tei-council mailing list