[tei-council] Internationalised domains

Stuart A. Yeates syeates at gmail.com
Sun Sep 25 15:41:19 EDT 2011


Full UTF-8 in the file part of URIs would seem to be a disaster for
us. Without whitespace being escaped we can't have whitespace
separated lists of URLs, as the definition of @corresp as "1–∞
occurrences of data.pointer separated by whitespace" no longer works?

cheers
stuart

On Fri, Sep 23, 2011 at 9:03 AM, Martin Holmes <mholmes at uvic.ca> wrote:
> I think I see the source of the confusion. Older W3C drafts seem to have
> explicitly addressed the issue of encoding URIs in US-ASCII:
>
> <http://www.w3.org/TR/2001/WD-charmod-20010126/#sec-URIs>
>
> but that section seems to have disappeared from the current draft:
>
> <http://www.w3.org/TR/charmod/>
>
> which, on a quick reading, leaves me with the impression that UTF-8,
> UTF-16 etc. are acceptable encodings.
>
> Cheers,
> Martin
>
> On 11-09-22 12:25 PM, Stuart A. Yeates wrote:
>> I was nuder the impression that non-latin-1 wasn't allowed in
>> data.pointer (and looking through the relevant standards I still can't
>> see how it is), but such things seem to validate, so I guess you are.
>>
>> So I'd like to apologize for for my misunderstanding and and withdraw
>> my suggestion.
>>
>> cheers
>> stuart
>>
>> On Thu, Sep 22, 2011 at 4:03 AM, Kevin Hawkins
>> <kevin.s.hawkins at ultraslavonic.info>  wrote:
>>> I still don't see why Stuart wouldn't simply put this in the TEI:
>>>
>>> <name sameAs=""http://موقع.وزارة-الاتصالات.مصر/">Egyptian Ministry of
>>>   >  Communication and Information Technology</name>
>>>
>>> <idno>http://موقع.وزارة-الاتصالات.مصر/</idno>
>>>
>>> and be done with it.  Generation of percent encoding and Punycode would
>>> be done by XSLT that produces whatever is used by the delivery system.
>>>
>>> --Kevin
>>>
>>> On 9/20/2011 11:27 PM, Stuart A. Yeates wrote:
>>>> The situations I am trying to avoid are:
>>>>
>>>> <name sameas="urn:example:%D9%85%D9%88%D9%82%D8%B9.%D9%88%D8%B2%D8%A7%D8%B1%D8%A9-%D8%A7%D9%84%D8%A7%D8%AA%D8%B5%D8%A7%D9%84%D8%A7%D8%AA.%D9%85%D8%B5%D8%B1"
>>>> copyOf="urn:example:xn--4gbrim.xn----rmckbbajlc6dj7bxne2c.xn--wgbh1c"
>>>> corresp="http://xn--4gbrim.xn----rmckbbajlc6dj7bxne2c.xn--wgbh1c/"
>>>> key="http://موقع.وزارة-الاتصالات.مصر/">Egyptian Ministry of
>>>> Communication and Information Technology</name>
>>>>
>>>> and
>>>>
>>>> <idno>http://xn--4gbrim.xn----rmckbbajlc6dj7bxne2c.xn--wgbh1c/</idno>
>>>> vs<idno>http://موقع.وزارة-الاتصالات.مصر/</idno>
>>>>
>>>> etc
>>>>
>>>> URLs are already using punycode in the domain part and percent
>>>> escaping in the file part (at least when they're used in data.pointer
>>>> ), and XML has some pretty strong dependencies on URLs, so neither can
>>>> be prohibited without serious consequences.
>>>>
>>>> Both punycode and percent encoding are mappings of UTF-8, they can be
>>>> converted back and forth with a 1:1 mapping. They are not violations
>>>> of the "use UTF-8" rule.
>>>>
>>>> cheers
>>>> stuart
>>>>
>>>>
>>>> On Wed, Sep 21, 2011 at 10:21 AM, Martin Holmes<mholmes at uvic.ca>    wrote:
>>>>> I agree. I think punycode is a temporary solution to problems with
>>>>> Internet infrastructure and user-agent limitations; if it's to be used,
>>>>> it should be generated during output processing, rather than being part
>>>>> of the core document. TEI XML should be in UTF-8, I think.
>>>>>
>>>>> Cheers,
>>>>> Martin
>>>>>
>>>>> On 11-09-20 03:12 PM, Kevin Hawkins wrote:
>>>>>> I guess what I'm saying is that Punycode is prescribed for use with the
>>>>>> Domain Name System, but our TEI documents might outlive DNS or be used
>>>>>> in a system that uses doesn't use DNS.  After all, even URIs (as
>>>>>> prescribed in RFC 3986) give DNS as an example of a name registry
>>>>>> mechanism, not the only one.
>>>>>>
>>>>>> We tie ourselves to a few external standards (maintained by the W3C)
>>>>>> which may become obsolete at some point, but I'm not sure whether we
>>>>>> should add systems maintained by ICANN to the list.
>>>>>>
>>>>>> --Kevin
>>>>>>
>>>>>> On 9/20/2011 2:31 PM, Stuart A. Yeates wrote:
>>>>>>> Punycode is already required (and happens automatically with modern
>>>>>>> tools and formats) for URIs. View the source of the (UTF-8) web page
>>>>>>> of my example website to see what I mean.
>>>>>>>
>>>>>>> The issue is when people put URIs and in free text fields where the
>>>>>>> tools are unaware that these are URIs and expect them to 'just work'.
>>>>>>>
>>>>>>> cheers
>>>>>>> stuart
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Sep 21, 2011 at 1:26 AM, Kevin Hawkins
>>>>>>> <kevin.s.hawkins at ultraslavonic.info>       wrote:
>>>>>>>> I'm not sure about prescribing use of RFC 3492.  This seems to me like
>>>>>>>> prescribing use of US-ASCII with character entity references instead of
>>>>>>>> UTF-8 within XML documents to ensure that we can use our documents with
>>>>>>>> a full range of software toolS -- something that fewer and fewer people
>>>>>>>> support doing.
>>>>>>>>
>>>>>>>> On 9/20/2011 4:49 AM, Stuart A. Yeates wrote:
>>>>>>>>> Currently domain names in TEI can occur in typed fields (such as
>>>>>>>>> data.pointer) or in many other fields where type checking is more
>>>>>>>>> relaxed (or non-existent). I would like to propose the following note
>>>>>>>>> to appear somewhere in the standard (I'm thinking the data.pointer
>>>>>>>>> page, but I'm open to suggestions). The URL in the example is perhaps
>>>>>>>>> the best-known punycode URL (see
>>>>>>>>> http://en.wikipedia.org/wiki/Masr_%28domain_name%29 ), but if Arabic
>>>>>>>>> script causes problems in the publishing process I can probably find a
>>>>>>>>> more Latin-esque one.
>>>>>>>>>
>>>>>>>>> cheers
>>>>>>>>> stuart
>>>>>>>>>
>>>>>>>>> ----
>>>>>>>>>
>>>>>>>>> Internationalised domains containing non-ASCII characters should
>>>>>>>>> always be escaped using RFC 3492 syntax ("punycode") Thus
>>>>>>>>> http://موقع.وزارة-الاتصالات.مصر/ is written
>>>>>>>>> http://xn--4gbrim.xn----rmckbbajlc6dj7bxne2c.xn--wgbh1c/ Such escaping
>>>>>>>>> permits internationalised domains to be used with a full range of
>>>>>>>>> software tools.
>>>>>>>>>
>>>>>>>>> ----
>>>>>>>>> _______________________________________________
>>>>>>>>> tei-council mailing list
>>>>>>>>> tei-council at lists.village.Virginia.EDU
>>>>>>>>> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>>>>>>>>>
>>>>>>>>> PLEASE NOTE: postings to this list are publicly archived
>>>>>>>> _______________________________________________
>>>>>>>> tei-council mailing list
>>>>>>>> tei-council at lists.village.Virginia.EDU
>>>>>>>> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>>>>>>>>
>>>>>>>> PLEASE NOTE: postings to this list are publicly archived
>>>>>> _______________________________________________
>>>>>> tei-council mailing list
>>>>>> tei-council at lists.village.Virginia.EDU
>>>>>> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>>>>>>
>>>>>> PLEASE NOTE: postings to this list are publicly archived
>>>>>
>>>>> --
>>>>> Martin Holmes
>>>>> University of Victoria Humanities Computing and Media Centre
>>>>> (mholmes at uvic.ca)
>>>>> _______________________________________________
>>>>> tei-council mailing list
>>>>> tei-council at lists.village.Virginia.EDU
>>>>> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>>>>>
>>>>> PLEASE NOTE: postings to this list are publicly archived
>>>> _______________________________________________
>>>> tei-council mailing list
>>>> tei-council at lists.village.Virginia.EDU
>>>> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>>>>
>>>> PLEASE NOTE: postings to this list are publicly archived
>>> _______________________________________________
>>> tei-council mailing list
>>> tei-council at lists.village.Virginia.EDU
>>> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>>>
>>> PLEASE NOTE: postings to this list are publicly archived
>> _______________________________________________
>> tei-council mailing list
>> tei-council at lists.village.Virginia.EDU
>> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>>
>> PLEASE NOTE: postings to this list are publicly archived
>
> --
> Martin Holmes
> University of Victoria Humanities Computing and Media Centre
> (mholmes at uvic.ca)
> _______________________________________________
> tei-council mailing list
> tei-council at lists.village.Virginia.EDU
> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>
> PLEASE NOTE: postings to this list are publicly archived


More information about the tei-council mailing list