9.292 vagueness in dictionaries

Humanist (mccarty@phoenix.Princeton.EDU)
Mon, 13 Nov 1995 18:50:27 -0500 (EST)

Humanist Discussion Group, Vol. 9, No. 292.
Center for Electronic Texts in the Humanities (Princeton/Rutgers)
http://www.princeton.edu/~mccarty/humanist/

[1] From: wulfric@epas.utoronto.ca (54)
Subject: Dictionary fuzziness

Vagueness/fuzziness

Most of the recent discussion of vagueness (or fuzziness, in French
"flou") has been in relation to literary texts. My own field is
metalexicography, where the corpus of study, dictionaries, is,
contrary to popular belief, full of fuzziness. Often referred to as
the Bible of language usage, the dictionary, like the Bible,
contains text that is open to multiple interpretations.

The Scottish Presbyterian rigour of James Murray made the task of
standardizing the OED for computerization a relatively easy one. The
task of doing the same for what many francophones consider its
French counterpart, the Tresor de la langue francaise (or TLF), is
immeasurably greater. The algorithm developed by Jacques Dendien
(INaLF, Nancy) for the automatic information-field and information-
field-interdependency parsing of the TLF works reasonably well for
the volume that served as a testing ground (vol. 14), but is either
unsatisfactory or downright hopeless for the others (there are 16 in
all).

[Among the potential areas of fuzziness are the boundaries
between Etymology (historical meaning) and Definition (functional
meaning), Definition (metalanguage) and Synonym (language),
Definition (langue) and Example (discours), Definition (word) and
Encyclopedic commentary (referent), Macrostructure (ordering of
entries) and Microstructure (organisation of information within
entries).]

In early dictionaries, as might be expected, the degree of
fuzziness/ambiguity is generally - not always - greater than in
modern ones.

As other contributors to the discussion have stated, the traditional
view of computer applications, derived from Western logic and
mathematics, imposes a need to tag, tag, tag and thus distort the
text, whether literary or dictionary. Tagging can be helpful for the
individual researcher, who does it for exploratory purposes (it can
remain dynamic), but becomes dictatorial when frozen in a
distributed database (whence the false sense of security given by
the (arbitrary) field labelling of the dictionary, whether printed
or electronic).

My own way of coping with information fields in early dictionaries
is to develop lemmatized lists of metalinguistic keywords based on
terms used in the text, such as "SIGNIFIE" (definition copula),
"VIENT DE" (etymology copula), "COMME" (example copula), "VIEUX"
(usage label), "FEMININ" (gender), etc. These allow the user to do
a fuzzy search on information fields: i.e. a large, representative,
number of definitions, etymologies, examples, etc. can thereby be
retrieved, never all, and accompanied by varying amounts of
discardable noise.

By the way, Humanist non-English (e.g. French, Italian, etc.) seems to
present difficulties for Gopher searching when diacritic substitutions are
used. I can find "Academie" but not "Acad/emie" (unless I am prepared to
plough through all "acad" hits such as "academic"). As a new member of
Humanist I imagine this has already been the subject of debate.

Russon Wooldridge
University of Toronto