9.321 ambiguity in dictionaries

Humanist (mccarty@phoenix.Princeton.EDU)
Mon, 27 Nov 1995 18:30:07 -0500 (EST)

Humanist Discussion Group, Vol. 9, No. 321.
Center for Electronic Texts in the Humanities (Princeton/Rutgers)
http://www.princeton.edu/~mccarty/humanist/

[1] From: wulfric@epas.utoronto.ca (37)
Subject: Examples of ambiguity in dictionaries; the
straitjacket of net searches

Marta Steele (9.268) asks for examples of ambiguity. In that poetry is
expected to be ambiguous and dictionaries not, I will offer one from the
latter. Also one from web searching.

The, perhaps not inappropriately chosen, Random House Unabridged Dictionary
(2nd ed., 1993) gives for one of the senses of <i> ambiguous </i> "<i> Ling.
</i> [...] having two or more structural descriptions" and for <i> analytic
</i> "(of a language) characterized by a relatively frequent use of function
words, auxiliary verbs, and changes in word order to express syntactic
relations, rather than of inflected forms". In other words, the authors of
RHUD chose to label <i> ambiguous </i> as a technical term, but left <i>
analytic </i> as unmarked usage. Lexicographers have continually to deal with
the possibility of multiple structural descriptions, in this case the fuzzy
border between technical and non-technical usage. Each dictionary tends to
label borderline cases differently from the others, and each dictionary has
its idiosyncratic list of labels.

Tagging these two items for the creation of an electronic database (one exists
for RHUD - the CD-ROM was given away with the printed dictionary) is of course
simple: "(of a language)" either goes into the definition field (rudimentary
field discrimination) or is treated as a collocate class type (sophisticated
field discrimination); "<i> Ling. </i>" will be treated as a marked usage
label. However only a full-text search will find both; a database search by
field will find either one or the other, depending on the selected field, but
not both.

My other example concerns characters that are interpreted by software EITHER
as delimiters or metacharacters OR as word characters., e.g. "/", "\". If I
write "Acad/emie" the human reader will interpret "/e" as acute "e"; some
programs will make that particular translation, others will cleverly ignore
the slash (thus "Academie", still perfectly understandable to the human
reader), while most will regard "/" as a delimiter/metacharacter and thus not
allow me to find all occurrences of the word. Such, I believe, is the case of
the Gopher searcher used for Humanist.

One lesson to be drawn from this is that if one wants to make the contents of
electronic discussions or references in languages other than English truly
accessible, diacritics and diacritic substitutions should not be used - this
aside from linguistic considerations that have fuelled much discussion on
Humanist. Therefore "Academie" and not "Acad/emie", please.

Russon Wooldridge
University of Toronto