4.1126 ISO10646: Coptic/Greek; AFII Glyph Registry (2/186)

Elaine Brennan & Allen Renear (EDITORS@BROWNVM.BITNET)
Tue, 5 Mar 91 20:28:00 EST

Humanist Discussion Group, Vol. 4, No. 1126. Tuesday, 5 Mar 1991.


(1) Date: Tue, 5 Mar 91 20:41:48 MET (23 lines)
From: Harry Gaylord <galiard@let.rug.nl>
Subject: Coptic and Greek Papyrology

(2) Date: Tue 5 Mar 91 18:07:54-EST (163 lines)
From: bbeeton <BNB@MATH.AMS.COM>
Subject: Codes -- Unicode, ISO10646, and the AFII glyph registry

(1) --------------------------------------------------------------------
Date: Tue, 5 Mar 91 20:41:48 MET
From: Harry Gaylord <galiard@let.rug.nl>
Subject: Coptic and Greek Papyrology

If you do not work on Coptic texts or in Greek papyrology, please
disregard this message.

Greek papyrologists need more symbols than normal researchers in Greek
texts. The new standards for character sets seem to be concentrated upon
normal Greek texts. If you need more characters in your work than are
described in the TLG complete set, please send me a list of the
additional character descriptions. I will include tne necessary
characters in either the Dutch Standards Institute commentary to the
proposals for the new ISO 10646 standard or work out entity declarations
as set out in the ISO SGML standard.

Coptic has been removed from the proposed ISO 10646 standard. If you have
proposals for adding Coptic characters, please contact me. There are a
number of Coptic characters attached to the Greek section of UNICODE, but
I am not sure if this is complete.

Harry Gaylord (galiard@let.rug.nl)
Groningen University, The Netherlands
(2) --------------------------------------------------------------175---
Date: Tue 5 Mar 91 18:07:54-EST
From: bbeeton <BNB@MATH.AMS.COM>
Subject: Codes -- Unicode, ISO10646, and the AFII glyph registry

This message is in response to comments regarding Unicode and ISO 10646,
from Edwin Hart and Johan van Wingen, that appeared on the ISO10646 list
during the period 15-18 Feb 91, were reposted to the Humanist list, and
forwarded to TeX-EURO.

Both coding and glyph representation are matters of considerable interest
to members of the ISO Text and Office Systems working group (SC18/WG8);
this group is currently defining standards for use in publishing as well
as in office text communication. The Association for Font Information
Interchange (AFII) is also an interested party. AFII has been designated
as the registrar for glyph and glyph collection identifiers under ISO
10036, the registration adjunct to ISO 9541, the recently approved
standard for Font Information Interchange.

The draft standard for Document Style Semantics and Specification
Language (DSSSL, ISO/IEC CD 10179, document ISO/IEC JTC 1/SC 18 N 2837,
1991-02-05, as forwarded to the ITTF for DIS processing) contains on
page 30 the following:

3.3 Bit combination and character mapping
3.3.1 Reference model

The mapping of a particular bit combination in the source document into
a glyph is specified through a two-step process. In the first part of
the map, the bit combination is mapped into a generic character. In the
second part of the map, the generic character is mapped into a glyph.
A generic character is normally associated with a specific glyph, and
is not dependent on a particular font or type of processing. More than
one character-to-glyph asociation table may have to be defined and
associated with different parts of the source instance in cases where
the semantic meaning is ambiguous (e.g., decimal point or hyphen) or
if for other reasons a single map is inappropriate to the context.
Semantic-specific glyph selection rules are applied during the formatting
process and may be dependent on such things as font, document context,
or state of the transformation process. These rules can cause selection
of glyphs other than primary glyphs, and the selection is not necessarily
on a one-to-one basis (e.g., ligatures). Glyph substitution rules are
applied during formatting when a selected glyph is not available (e.g.,
substituting individual glyphs for ligatures)

For each generic character, a set of attributes describing character
properties shall be specified in the Generic Character Property
Definition. These may be used by the formatter to control processes
such as line breaks and hyphenation.

Page 31 contains the following:

3.3.1.6 Glyph index dictionary

Conceptually, the content of the formatted result is expressed using
glyph identifiers. To achieve a more compact representation, a mapping
from multi-byte glyph identifiers to single-byte compacted glyph
identifiers, together with indications of the mapping change, may be
used.

Note 21 Glyph identifiers, as well as compacted glyph identifiers
(unlike characters), are not ordered and do not imply a collating
sequence. Glyph compaction data shall not be intrinsic to a font
resource as defined by ISO/IEC 9541.

The actual representation of the content of the formatted result is
determined by the implementation.

Note 22 It may be represented in accordance with ISO/IEC 10180.

A glyph index dictionary used in the formatting process presupposes that
glyphs (as presented) and characters (as input) are not necessarily the
same entities. AFII is committed to providing a map between current and
future multi-byte code standards to be used for document printing.

AFII is currently printing the ISO 10646 code chart, and simultaneously
creating a map as part of this printing process. While doing this it is
also necessary to respond to questions concerning differences between
the characters and glyphs representing different languages as contained
in ISO 10646.

This is not the only service AFII is providing to the standards and
publishing communities. AFII will be generating the first map between
character codes and glyph IDs. As mentioned earlier, AFII is the
registrar for glyph and glyph collection identifiers, providing this
service through an international registry, and considers the definition
and registration of collections an important part of its work.

A collection in the AFII sense is a register of font showings, that is,
the images that are appropriate for language scripts. For example, the
AFII cyrillic collection (#91) contains the complete complements of
glyphs in current use for rendering of slavic languages both within and
outside of the Soviet Union, as well as historical glyphs and those used
for representation of those non-slavic languages typically rendered in
cyrillic. No code points are attached to any of these glyphs. AFII has
been gathering information with the cooperation of the U.N., and the
present cyrillic collection represents the glyph complements for 57
languages. An example of a glyph for which information is still wanting
is that known as `fita' (in the pre-1918 Russian alphabet). Information
from the U.N. indicates that this glyph is in current use in 11 languages
(Turkmen, Bashkir, Khirgiz, Biryut, et al.). Thus the name `fita' may be
inadequate, and a new description has been requested from Moscow.

The East Asian languages are also an area of concern, as they represent
the largest single corpus of characters/glyphs to be dealt with. Within
ISO 10646, references are made to several national standards: JIS (Japan),
GB (the PRC), and KS (Korea); however, no code tables are included. AFII
would like to provide a map for those standards, but another step is
necessary. There needs to be an accord on what is the font showing for
East Asian ideographic glyphs. Preliminary font showings have been
prepared and submitted to representatives of the standards communities of
these countries. Feedback has been received from Japan and Korea with
reference to their standards; a response is awaited from the PRC. The
process of reaching an accord among those parties will continue until
none of the shapes are contested. As an example of the degree of
agreement so far, of a font showing of 500 glyphs (from a total of 5000)
provided to the Korean representative, 100 were questioned and are being
redesigned and new IDs assigned.

The important point about this process is that it is guided by logical
rules, which provide that, for each glyph, there must be no question as
to what it means. If, in different dialects, a reader may be misled by
a shape, a new shape will be provided. In the past, the coding people
wanted to minimize the number of codes for purposes, say, of telegraphy,
where addition of more codes could require an additional bit. The goal
of the glyph registry is different: accord without miscommunication.
When the East Asian glyph accord is reached for the several national
standards, a map will exist for national standards referenced in 10646.

AFII will eventually provide and make public a map between the 10646
multi-byte code standard and others, including reference national
standards, to be used for document printing.

AFII firmly believes that if character code unification were to be
undertaken, it should be done only after accord is reached on the font
showing. Further, a conformance test using the glyph accord should be
applied to assure communication without misinformation or missing
information.

Conformance may be as simple as comparing the DSSSL specified and
printed document using the national standard code/accord glyph id map
to the document rendered using the unification standard translation and
a unification standard/accord glyph id map.

AFII is committed to the principle of communication without
misinformation or missing information. The program of achieving this
goal will be allowed to take as long as it needs. Progress can be
speeded if (1) there are more knowledgeable participants in this
activity, or (2) more financial support is provided to speed this very
tedious and cumbersome process.

A prospectus on AFII and its goals is available from the chairman of
the organization:
Archie Provan
Rochester Institute of Technology
School of Printing
One Lomb Memorial Drive
Rochester, NY 14623
U.S.A.
716-475-2052

-- Barbara Beeton
Secretary, AFII