[tei-council] Internationalised domains

Stuart A. Yeates syeates at gmail.com
Fri Oct 7 16:31:43 EDT 2011


On Sat, Oct 8, 2011 at 5:40 AM, Kevin Hawkins
<kevin.s.hawkins at ultraslavonic.info> wrote:
> On 10/7/2011 1:58 AM, Stuart A. Yeates wrote:
>> On Fri, Oct 7, 2011 at 4:59 PM, Kevin Hawkins
>> <kevin.s.hawkins at ultraslavonic.info>  wrote:
>>
>> <snip great summary>
>>
>>> But I fear that I have missed the point of Punycode here.
>>
>> The advantage of punycode over percent encoding is that DNS doesn't
>> accept % encoding and making it accept percent encoding would require
>> rewriting portions of network stacks in embedded devices (i.e. every
>> router on the planet).
>
> So Stuart, are you suggesting that we revise
> http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-data.pointer.html to
> prescribe RFC 3492 (Punycode) encoding of URIs, which should follow RFC
> 3986 except for not using percent encoding?

I am proposing that we prescribe using a specific encoding for at
least the file part of URLs.

My rationale for that is that without encoding there is (a) ambiguity
about where one URL stops and another starts in lists of 1–∞ URLs and
(b) ambiguity about whether the URL is encoded leading to issues with
generic conversion to HTML, ODF, RDF, etc needing to guess the
encoding of URLs and sometimes getting it wrong.

My preferred approach is to defer to the w3c / IETF rather than
reinventing the wheel.

As I understand it, the w3c recommends the use of IRIs which are
mapped to URIs (see RFC 3987). But allows legacy applications of
different rules (see http://www.w3.org/TR/leiri/ ). I suggest we go
with the recommendation.

As I understand it this gives URLs that use:

* UTF-8 Normalized Form C
* lower-case, punycode-encoded Internationalized Domain Names where necessary
* upper case, percent-encoding triplets where necessary
* no /./ and /../ fragments

To return to the original question, the answer is: No, I suggest we
revise http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-data.pointer.html
to follow RFC 3987 in all details. I further suggest that we include
some motivating / worked examples.

cheers
stuart


More information about the tei-council mailing list