Date: February 27
Topic: natural language processing
Leader: Steve Ramsay
Three threads for nlp:
1: philosophical. language is much more messy than you might
expect. consistent with many other "topics" we've discussed--
discover that we don't really know the "traditional" medium when we
try to encode that medium in an electronic environment. We we try to
make maps in a computer, we discover we don't really know what maps
are. When we try to markup texts, we discover that we have many
different ideas about what a text is.
2: historical: arc from Chomsky's structuralism to probabalistic,
stochastic methods. the second seems to handle the combinatorial
explosion of ambiguity that characterizes language. (this shift
aligns with the general shift in AI methods)
3: pedagogical: probably half the people in the world who call
themselves "computing humanists" are doing natural language
processing of some sort. to what degree do we want the program to
engage that side of the discipline?
------
GR: one article made a crucial distinction: dealing with text as data
vs. dealing with text as language. if I were teaching this, I would
stop with zed scores.
JM: back to distinction btwn rational and statistical methods. even
in statistical methods, you have to return to some rational
structure, assupmtions, foundations.
SR: right, and i think that where we'll return is that these two
schools, methods must be combined. the real debate is probably when
we should use what. in systems that might combine both rational and
statistical methods, the question might be when which system is
used. where does the structural work better; where does the rational
work better?
GR: to what extent do our current understanding of grammar and units--
words, in particular--rely on written notions of language?
JD: phonetics and phonology and written language are not isomorphic.
that's a fundamental mistake of this article.
JM: I hypothesize that ancient, oral Greek is more based on rhythms
and pauses of music, not what we current recognize as grammatical
units, words in particular.
JM: Mark Baker's new book, _The Atoms of Language_, investigates the
similarities across languages. What are the universal principles or
units that tie all languages. Back to a rationalist, structuralist
mode.
SR: yes, but if so, then what do we know? something about the
brain? something about culture?
JD: where does this fit in the program?
JU: at least the historical information is useful, just so that our
students have a more complete understanding of the field.
GR: maybe one of these readings should be supplemented with a piece
on creating corpora. someone might want to develop a corpus of
magazine advertisements for women's cosmetics from 1950 - 1970. that
would provide them with an introduction to the terms and concepts and
give them something "real" to chew on.
JD: what about nlp and visualization?
SR: lots of people doing this?
GR: very quickly get into mva and very complicated mathematics
JM: huge gap here btwn natural language and text. what we need is a
concept like discourse field. what we have are quantum hunks: tokens
in a condition of uncertainty. there are some unidentified but
cogent connections between the countable elements.
SR: the goals of many of these people--the nlp people--is rather
practical, instrumental, to create devices that allow people to speak
commands--"lights, turn off" different goals.
JU: one question: what's worth counting here?
not sure that all of this is so different than textual analysis. if
you're going to ask a system to markup a text, the system will have
to be able to process the language.
underlying theme of humanities computing--why do we keep trying to
make computers do things like people? why do we try to "humanize"
the machines?
GR: nlp between text processing and AI
GR: turing test. questions about meaning and understanding. these
are themes that should be returned to in the section on AI.
why did tests of intelligence begin with testing conversational
ability?
maybe we should follow the path from Turing to Searle.
this also connects to Phil's work with interface. many early
interface studies started with Turing as well.
JM: is sgml automation or augmentation?
GROUP: augmentation
GR: augmentation. at the beginning, it probably was automation. but
that failed. now we see markup as a method of augmenting the text.
JU: SGML asks you to augment the text.
SR: markup augments the text by enabling readers to see more
patterns.
GR: augmentation is seeing something you didn't expect to see. not
necessarily the process of markup, then a procedure that can be
layered on top that then shows you something about the marked up text
(s) that you didn't already know
JU: maybe we should include Richard Powers _Galatea 2.2_ in the
curriculum
This archive was generated by hypermail 2b30 : Wed Feb 27 2002 - 13:49:17 EST