9.265 accented characters, lynx, &c.

Humanist (mccarty@phoenix.Princeton.EDU)
Thu, 2 Nov 1995 08:40:07 -0500 (EST)

Humanist Discussion Group, Vol. 9, No. 265.
Center for Electronic Texts in the Humanities (Princeton/Rutgers)
http://www.princeton.edu/~mccarty/humanist/

[1] From: "Eric D. Friedman" <friedman@hydra.acs.uci.edu> (70)
Subject: Re: 9.262 accented characters in lynx?

[2] From: Jim Marchand <marchand@ux1.cso.uiuc.edu> (40)
Subject: Fonts and code pages

[3] From: Maurizio Oliva <Maurizio.Oliva@m.cc.utah.edu> (44)
Subject: Re: 9.262 accented characters in lynx?

[4] From: W Schipper <schipper@morgan.ucs.mun.ca> (11)
Subject: Re: 9.262 accented characters in lynx?

[5] From: "Eric D. Friedman" <friedman@hydra.acs.uci.edu> (69)
Subject: Re: 9.262 accented characters in lynx?

[6] From: David Sewell <dsew@packrat.aml.arizona.edu> (57)
Subject: Re: 9.262 accented characters in lynx?

[7] From: David Sewell <dsew@packrat.aml.arizona.edu> (4)
Subject: Re: 9.262 accented characters in lynx? [2]

[8] From: Lou Burnard <lou@vax.ox.ac.uk> (17)
Subject: RE: 9.262 accented characters in lynx?

--[1]------------------------------------------------------------------
Date: Tue, 31 Oct 1995 19:43:42 -0800
From: "Eric D. Friedman" <friedman@hydra.acs.uci.edu>
Subject: Re: 9.262 accented characters in lynx?

Two things need to be in place for accents to display properly via lynx:
First, the browser must be set to the Latin 1 character set (for French,
German and Spanish: there are other sets for other languages). This
option is easily configured by the user or by the system administrator.
Indeed, given the Web's origins in Switzerland, it's not unusual for
Latin 1 to be the default on many systems. Here's a copy of the Options
screen where this is configured:
Options Menu

E)ditor :
D)ISPLAY variable : NONE
B)ookmark file : lynx_bookmarks.html
F)TP sort criteria : By Filename
P)ersonal mail address :
S)earching type : CASE INSENSITIVE
C)haracter set : ISO Latin 1
V)I keys : OFF
e(M)acs keys : ON
K)eypad as arrows
or Numbered links : Numbers act as arrows
U)ser mode : Novice

The second thing is a little more tricky. The terminal software must
be set to translate Latin 1 characters. On a Macintosh running NCSA
Telnet this is easily done: one simply selects ISO 8859-1 translation
from the Translation menu under the Session pulldown. I leave this on
all of the time, since it has no effect on ordinary characters (those with
only 7 bits). On PC's this can be more difficult (hey, you get what you
pay for), as I have yet to see a terminal program which had ISO 8859-1
translation as an option. On xterms there's no difficulty, but I doubt
those users will be running lynx.

Note that one can write email with accents by following the same
recipe (in Pine, anyway). Turn on ISO 8859-1 translation and set
Pine's character set to iso-8859-1 in the configure menu. From a
mac you can key accented characters into your mail just as you
would in a word processor, and from an Xterm you can use the Meta key.
Again, however, the PC eludes me and not for lack of trying.

If someone comes up with a PC solution to this problem, I'd love to
know what it is.
Best of luck.
Eric Friedman
friedman@uci.edu
Comparative literature
UC Irvine

--[2]------------------------------------------------------------------
Date: Tue, 1 Nov 95 10:42:12 CST
From: Jim Marchand <marchand@ux1.cso.uiuc.edu>
Subject: Fonts and code pages

This is in partial answer to all the problems people have been having sending
things like extended characters and the IPA over networks. You _can_ send
IPA, but only if you and the recipient are on the same page, so to speak.
For example, it would be easy to encode your IPA symbols using a macro within
your word processor &aacute; la SGML, so that you could write &barredi; for
that funny-looking i sound with a bar through it that we Southerners use.
You could furnish your correspondents with a macro which would reverse the
process, _assuming they had the phonetic alphabet available on their
machines_. If you needed names for the phonetic symbols, you could crib them
from _The Unicode Standard_ Version 1.0, vol. 1 (Reading, MA: Addison-
Wesley), p. 36 f. and 188 ff., or make them up, since it is your code. This
is a great book, and it is nice to browse through to see what the future will
bring. It will also teach you about code pages.
On the question of Lynx's limitations: I believe (and I have used it a lot)
that Lynx uses US ASCII (the first 128 characters, two to the seventh = 128).
The reason that Mosaic cannot do IPA symbols is that it uses the so-called
Latin-1 code page, see _The Unicode Standard_, 32 f., 176 ff. A code page
defines the character set which will be used. If you are not on the same
page (pun intended) things will not work out. For example, I download stuff
from Lysator in Sweden, which uses extended characters. If I make the
mistake of then calling the file up in my mail program (NUPOP), every
extended character is rendered as !, which violates the rule of biuniqueness,
so I am stuck. This is why we get so many messages from people who use MIME
encoding and have =20s all over the place.
If you use Windows NT, you have the code page of the future, Unicode,
available to you, but you still have to encode if you are going to send
something over the Internet, which uses US ASCII. It may look good on your
screen, but it will be garbage to me. Another example: to save paper, I use
LJ2 to print off in small characters, but it uses an old HP character set, so
I have to make a macro to change this into IBM code page 850 before I send it
to the printer. It is not at all friendly; if it does not recognize a
letter, it just leaves it out, so that German fu"llen comes out as fllen.
A good recent book on all this, but hard reading, is Nadine Kano,
_Developing International Software for Windows 95 and Windows NT_ (Microsoft
Press, 1995). [Windows 95 is not Unicode compliant]. An even better book
(shameless plug) will be my forthcoming (Random House/Ballantine) book _The
Use of the Computer in the Humanities_, where I discuss code pages and fonts
ad nauseam, and even tell you how to make up your own medieval fonts in
TrueType (usable by all Windows apps).
I should point out that much of all this is discussed in font.FAQs.!
Jim Marchand.

--[3]------------------------------------------------------------------
Date: Wed, 1 Nov 1995 12:06:29 -0700 (MST)
From: Maurizio Oliva <Maurizio.Oliva@m.cc.utah.edu>
Subject: Re: 9.262 accented characters in lynx?

A quick try could be:
lynx
o (options)
c (character set)
ISO Latin 1

and then the enter key.

for detailed information refer to:

ISO 8859-1 National Character Set FAQ by Michael K. Gschwind

that you can find at:

ftp://ftp.vlsivie.tuwien.ac.at/pub/8bit/FAQ-ISO-8859-1

=-=-=---=-
Maurizio Oliva http://italia.hum.utah.edu/ Maurizio.Oliva@m.cc.utah.edu
Adj. Instructor, Dept. of Lang. and Lit. LNCO 1303, University of Utah,
Salt Lake City, Utah 84112; W 801-585-3008 H 537-7016 F 581-7581

--[4]------------------------------------------------------------------
Date: Tue, 31 Oct 1995 23:31:25 -0330 (NST)
From: W Schipper <schipper@morgan.ucs.mun.ca>
Subject: Re: 9.262 accented characters in lynx?

Willard:

The simplest solution might be to have two flavours available so to
speak, one for a text-only browser like Lynx, in which only a minimal
set of characters is possible; the other for full graphics capability
like Netscape/Mosaic. At the opening page you can then give your
readers a choice to select the text only mode.

Bill

-- 
W. Schipper                         Email: schipper@morgan.ucs.mun.ca
Department of English,              Tel: 709-737-4406
Memorial University                 Fax: 709-737-4528
St John's, Nfld. A1C 5S7

--[5]------------------------------------------------------------------ Date: Tue, 31 Oct 1995 19:43:42 -0800 From: "Eric D. Friedman" <friedman@hydra.acs.uci.edu> Subject: Re: 9.262 accented characters in lynx?

Two things need to be in place for accents to display properly via lynx: First, the browser must be set to the Latin 1 character set (for French, German and Spanish: there are other sets for other languages). This option is easily configured by the user or by the system administrator. Indeed, given the Web's origins in Switzerland, it's not unusual for Latin 1 to be the default on many systems. Here's a copy of the Options screen where this is configured: Options Menu

E)ditor : D)ISPLAY variable : NONE B)ookmark file : lynx_bookmarks.html F)TP sort criteria : By Filename P)ersonal mail address : S)earching type : CASE INSENSITIVE C)haracter set : ISO Latin 1 V)I keys : OFF e(M)acs keys : ON K)eypad as arrows or Numbered links : Numbers act as arrows U)ser mode : Novice

The second thing is a little more tricky. The terminal software must be set to translate Latin 1 characters. On a Macintosh running NCSA Telnet this is easily done: one simply selects ISO 8859-1 translation from the Translation menu under the Session pulldown. I leave this on all of the time, since it has no effect on ordinary characters (those with only 7 bits). On PC's this can be more difficult (hey, you get what you pay for), as I have yet to see a terminal program which had ISO 8859-1 translation as an option. On xterms there's no difficulty, but I doubt those users will be running lynx.

Note that one can write email with accents by following the same recipe (in Pine, anyway). Turn on ISO 8859-1 translation and set Pine's character set to iso-8859-1 in the configure menu. From a mac you can key accented characters into your mail just as you would in a word processor, and from an Xterm you can use the Meta key. Again, however, the PC eludes me and not for lack of trying.

If someone comes up with a PC solution to this problem, I'd love to know what it is. Best of luck. Eric Friedman friedman@uci.edu Comparative literature UC Irvine

--[6]------------------------------------------------------------------ Date: Tue, 31 Oct 1995 22:18:53 -0700 (MST) From: David Sewell <dsew@packrat.aml.arizona.edu> Subject: Re: 9.262 accented characters in lynx?

>A technical question. What determines the ability of lynx to display >accented characters, and how must (or should) these be represented so that >the least has to be done by the largest number of users in order to see >them? [...]

Willard,

Lynx recognizes and will attempt to display most if not all of the characters in the ISO LATIN 1 character set that is used in HTML. You don't need to do a thing differently for lynx users: code e accent-aigu as &eacute; and so forth. I can read French under Lynx with no problem--once everything is configured right. Read on.

There's one crucial setting lynx users must have in their ..lynxrc file:

character_set=ISO Latin 1

(I think the stock default is the IBM PC setting, so this needs to be changed.)

What lynx will actually do, given this setting, is to output the appropriate 8-bit ASCII character for each ISO-LATIN character it sees in the HTML. So, for example, &eacute; is output as decimal character 233.

The problem from this point on is, as your query suggests, related to terminal emulation. Lynx cannot guarantee that the decimal 233 character it outputs is going to display properly. For that to happen, a lot of other things have to be true. Assume a user who is running lynx from a Unix shell account access by dialing in via a modem, using a communications program--a typical case. Then:

1. The user's PC (local) operating system must be configured to display the full ISO-LATIN character set. (In Windows this is easy; it's a major pain in DOS, involving code-page settings at the DOS level and character translation in the communications program.) 2. The user's communication program must be configured for an 8-bit character display. If it has selectable code pages (as C-KERMIT for OS/2 does, for example), the ISO-LATIN 1 page must be selected. 3. The terminal server at the remote end must be configured for clean 8-bit transmission (these days usually not a problem, but no guarantees...) 4. The Unix terminal driver must be configured for an 8-bit connection (using the "stty" command). In some cases a national-language environment variable must be set. 5. As noted above, .lynxrc file must have the proper character_set line.

All of these things are doable, but they put far more of a burden on the user than running Netscape (say) does.

That said, I know of one tremendous WWW source of documentation on implementing foreign-language capability for various terminal/ comm program situations:

http://www.vlsivie.tuwien.ac.at/mike/i18n.html

See especially the FAQ on the ISO 8859-1 Character Set (which has a lot of how-to's on terminal settings) and the Overview of Nordic Characters, which has some very clear how-to's that apply to French as well as to Nordic characters (skip past the Scandinavian- specific stuff).

The USENET group fr.news.8bits has postings connected with international character issues, if you have access to the fr hierarchy.

--[7]------------------------------------------------------------------ Date: Tue, 31 Oct 1995 22:40:23 -0700 (MST) From: David Sewell <dsew@packrat.aml.arizona.edu> Subject: Re: 9.262 accented characters in lynx? [2]

An addendum to the lynx discussion: if you're using a PC running MS-DOS, it is possible to use the IBM PC or 7-bit approximation settings in .lynxrc to avoid some of the ISO LATIN hassle, but the results aren't as good if you're regularly accessing non-English materials.

--[8]------------------------------------------------------------------ Date: Wed, 01 Nov 1995 12:39:03 +0000 From: Lou Burnard <lou@vax.ox.ac.uk> Subject: RE: 9.262 accented characters in lynx?

It's got nothing (much) to do with Lynx. It has everything to do with the terminal you're using and what character sets it supports. If (like me, now) you are banging away on a VT320 hooked up to a VAX mainframe, you won't be able to display any character set which uses more than 7 bits. If (like me, at opther times) you are banging away on a PC or a Mac or an X-terminal, then you have a choice of various 8 bit character sets or, as IBM people call them, codepages.

In France, as far as I can see, most people use ISO 8859. This works without problem on Unix systems running X and on Macs (with a bit of juggling), on PCs (with quite a lot of juggling) and on VT320s like this one not at all (well, to be fair, it can be tweaked, but it's more trouble than it's worth)

The best solution is, of course, what it says in the TEI Guidelines! Either

(a) make up and document your own simple set of conventions (as per most french speaking email lists)

or

(b) use entity references

bestest

Lou