4.0995 Unicode and ECMA (2/254)

Elaine Brennan & Allen Renear (EDITORS@BROWNVM.BITNET)
Thu, 7 Feb 91 22:21:35 EST

Humanist Discussion Group, Vol. 4, No. 0995. Thursday, 7 Feb 1991.


(1) Date: Thu, 07 Feb 91 22:06:48 EST (36 lines)
From: Allen Renear <EDITORS@BROWNVM>
Subject: Unicode Comment Period Closing

(2) Date: Thu, 31 Jan 91 08:21:32 PST (218 lines)
From: Mike Ksar <ksar@HPCEA.CE.HP.COM>
Subject: ECMA letter to Unicode
Originally Posted on: Multi-byte Code Issues <ISO10646@JHUVM>
Forwarded by Allen Renear <EDITORS@BROWNVM>

(1) --------------------------------------------------------------------
Date: Thu, 07 Feb 91 22:06:48 EST
From: Elaine Brennan & Allen Renear <EDITORS@BROWNVM>
Subject: unicode

The Unicode 1.0 comment period ends February 15th.

Let me reiterate James O'Donnell's suggestion that all interested
humanists -- and particularly anyone who works with languages that
are written in non-latin alphabets -- send for a copy of the
Unicode Final Review Document. It is available free from...

MICROSOFT!ASMUSF@UUNET.UU.NET.
(they are Express Mailing it now)

As O'Donnell and Robin Cover pointed out, this standard is
well supported by the computing industry and may have a considerable
effect on humanities computing.

There is also a somewhat competing multi-byte character coding standard
being developed by ISO, ISO10646. This standard, Unicode, and general
general topics in the coding of diacritics and non- latin letter writing
systems are discussed on the listserv list ISO10646 at JHUVM.
You can subscribe by sending mail to LISTSERV at JHUVM with the
following line as the body

SUBSCRIBE ISO10646 <your name>

Anyone interested in alphabets, writing systems, languages, and
and computing will find this list fascinating.

As O'Donnell pointed out ECMA -- the European Computing Manufacturers
Association -- has endorsed ISO10646. Below is a letter from the
ECMA Secretary General commenting on Unicode...


(2) --------------------------------------------------------------228---
Date: Thu, 31 Jan 91 08:21:32 PST
From: Mike Ksar <ksar@HPCEA.CE.HP.COM>
Subject: ECMA letter to Unicode
Originally Posted on: Multi-byte Code Issues <ISO10646@JHUVM>
Forwarded by Allen Renear <EDITORS@BROWNVM>

>
Below is a message that has been received from ECMA TC1 addressed to
Asmus which describes the position of ECMA TC1 on Unicode 1.0.

Mike


> 29th January 1991
>
>
>
>
> Sir,
>
> I have received the document entitled UNICODE 1.0. It was
> discussed at the meeting of TC1, the ECMA coding committee.
> The views of ECMA in matters of multiple-byte coding are as
> follows.
>
> As a matter of principle ECMA believes strongly that a coded
> character set for world-wide multi-lingual applications must
> be developed by the appropriate, world-wide recognized
> standardization organizations, viz. ISO and IEC. ECMA, as an
> A-liaison organization of them, contributes to, participates
> actively in, and supports, their work.
>
> The task of ISO/IEC/JTC1/SC2 and its WG2 is to develop a
> universal coded character set which will include "all"
> scripts of the world and to provide the coding scheme
> necessary to achieve this aim. Whilst WG2 tried first to
> achieve this with a 2-byte coding, it was rapidly discovered
> that this will not suffice, thus the present structure of
> ISO/IEC/DIS 10646. The aim of Unicode is obviously to limit
> itself to a 16-bit code and to represent with it as many
> graphic characters as possible.
>
> The two approaches are fundamentally different in aims and
> means. Because of this basic difference of approach, the
> following aspects inherent to UNICODE do not meet the strict
> criteria established for a world-wide, universal coded
> character set.
>
> i) Defined repertoire
>
> The repertoire, that is the number of characters which
> can be represented in coded form by means of the bit
> combinations of the code, is undefined. Indeed, the
> use of "floating" accents and, in general, the
> facility to combine the images of two or more graphic
> characters into one single graphic symbol representing
> a character not included in the basic coded set yields
> a practically infinite or, at best, undefined
> repertoire.
>
> ii) Conformance
>
> As a consequence of i) it is generally impossible to
> define the requirements for conformance. Because of
> the possibility of duplicate coding (see iv) below)
> the same set of data could be coded in different ways
> and these different codings could all satisfy the same
> conformance clause for CC-data-elements, which would
> be completely against the well established principle
> of unique coding. Moreover, the absence of a
> defined, finite repertoire makes it generally
> impossible to determine the conformance requirements
> for a receiving character-imaging device.
>
> iii) Fixed-length coding
>
> UNICODE is not a true 16-bit code, since some accented
> characters may require 32 or 48 bits for their coded
> representation, depending on the number of associated
> "non-spacing" diacritical marks.
>
> It is well known that handling of strings of coded
> characters with a different length of coded
> representations causes problems, in particular for
> programming languages.
>
> iv) Duplicate coding
>
> Alternative coded representations are available for
> many of the accented characters, since some of them
> are coded as single characters, and can also be
> represented as a pair of characters using a "non
> spacing" diacritical mark. This causes much difficulty
> in string-search operations in text processing, and
> for key-matching in data bases.
>
> v) Ideographic characters
>
> The position of ECMA and of most, if not all, European
> National Standards Institutes is that proposals for
> the coded representation of ideographic characters
> must be the subject of review and approval by the
> National Standards Institute of the countries directly
> concerned and not imposed by a private consortium.
>
> A central technique allowing to handle ideographic
> characters in Unicode is possible only by unification
> of the Han characters. This unification of Han
> characters is difficult due to the open nature of
> ideographic characters, as their exact number is not
> known, new ones can be invented over time. Asian
> countries are planning to form a joint research group
> to study this matter with academic, cultural and legal
> considerations. Only a proposal agreed by them should
> be included in an International Standard.
>
> ISO 10646 uses a code structure allowing the inclusion
> of Chinese, Japanese and Korean ideographic characters
> as distinct characters.
>
> A conversion from ISO 10646 to Unicode would cause an
> information loss as three distinct characters from ISO
> 10646 would be mapped on one single character in
> Unicode. Ideographic characters must be displayed with
> the appropriate font for each country (by user
> preference/demand and by regulations), thus some kind
> of local information must be carried. Local
> information is also necessary in order to process
> characters since character attributes are different.
>
> For all these reasons we are not supporting a set of
> "unified" Han characters outside the private-use
> planes of ISO 10646 as long as not supported by all
> relevant Asian countries.
>
> vi) Use of the control functions areas
>
> Because of its limitation to a 16-bit code table,
> Unicode assigns graphic characters in the areas
> corresponding to the C0 and C1 sets of control
> functions in ISO 2022. This will cause considerable
> difficulties with many existing communication systems
> and products which assume the code structure of ISO
> 2022. The migration of 8-bit systems to multiple-byte
> code will, thus, be impaired.
>
> ISO 6429 is the repertoire of control functions
> adopted by ISO. It specifies them not only in terms of
> exact definitions but it also allocates a precise
> coding. The corresponding bit combinations must be
> retained when these control functions will be used in
> the multiple-byte environment of ISO 10646.
>
> ECMA TC1 are preparing a revision of their Standard
> ECMA-48 (on which ISO 6429 is based) in which specific
> control functions for bi-directional texts and for
> text communication will be included. It is essential
> that for these particularly sensitive applications no
> additional problems arise due to the coding of these
> control functions.
>
> vii) Character naming
>
> ISO has established a methodology for the generation
> of unique names of characters world-wide. This is
> needed not only for coherence in the coding work, it
> has also been strongly required by other disciplines
> such as programming languages. The present scheme has
> been discussed and agreed between JTC1/SC2, SC21 and
> SC22. The adoption of alternative names will only
> cause unnecessary confusion.
>
> viii) Presentation forms
>
> Again, because of its inherent limitations as a 16-bit
> code, only a minute fraction of the presentation forms
> required can be included. In particular, the
> requirements for Arabic presentation forms cannot be
> satisfied by Unicode. The necessary number of such
> presentation forms has been established by the ECMA
> Arabic Task Group in co-operation with recognized
> experts from Arabic countries and with the National
> Standards Institutes of these countries. Further input
> was also received from China and the United Kingdom.
> It is out of question to reduce this number.
>
> This latter example, in addition to that of the ideographic
> characters, illustrates the need to develop international
> standards in the international, recognized standardization
> organizations, viz. ISO and IEC and not in a private group with a
> participation practically limited to North America.
>
> The conclusions of the discussion and review by the members
> of ECMA TC1 lead to the very firm opinion that :
>
> - the approach of ISO 10646 is the only right one for a
> world-wide universal coded character set,
>
> - the present structure of ISO 10646 offers the possibility
> to allocate planes for private use, thus other coding
> schemes like Unicode and/or a private unified set of
> ideographic characters could be allocated to such planes.
>
> ECMA TC1 will continue to participate in and strongly support
> the efforts of ISO/IEC/JTC1/SC2 and its WG2 toward the issue
> of ISO 10646 and will oppose alternative proposals. ECMA will
> contribute to the work for further complements to the first
> issue of this International Standard.
>
>
>
> Yours faithfully,
>
>
>
>
> D. Hekimi
> Secretary General
>
>