dhcs minutes: 2/27

From: andrea laue (akl3s@cms.mail.virginia.edu)
Date: Wed Feb 27 2002 - 13:49:11 EST

  • Next message: Andrea K. Laue: "dhcs: warning"

    Date: February 27
    Topic: natural language processing
    Leader: Steve Ramsay

    Three threads for nlp:

    1: philosophical. language is much more messy than you might
    expect. consistent with many other "topics" we've discussed--
    discover that we don't really know the "traditional" medium when we
    try to encode that medium in an electronic environment. We we try to
    make maps in a computer, we discover we don't really know what maps
    are. When we try to markup texts, we discover that we have many
    different ideas about what a text is.

    2: historical: arc from Chomsky's structuralism to probabalistic,
    stochastic methods. the second seems to handle the combinatorial
    explosion of ambiguity that characterizes language. (this shift
    aligns with the general shift in AI methods)

    3: pedagogical: probably half the people in the world who call
    themselves "computing humanists" are doing natural language
    processing of some sort. to what degree do we want the program to
    engage that side of the discipline?

    ------

    GR: one article made a crucial distinction: dealing with text as data
    vs. dealing with text as language. if I were teaching this, I would
    stop with zed scores.

    JM: back to distinction btwn rational and statistical methods. even
    in statistical methods, you have to return to some rational
    structure, assupmtions, foundations.

    SR: right, and i think that where we'll return is that these two
    schools, methods must be combined. the real debate is probably when
    we should use what. in systems that might combine both rational and
    statistical methods, the question might be when which system is
    used. where does the structural work better; where does the rational
    work better?

    GR: to what extent do our current understanding of grammar and units--
    words, in particular--rely on written notions of language?

    JD: phonetics and phonology and written language are not isomorphic.
    that's a fundamental mistake of this article.

    JM: I hypothesize that ancient, oral Greek is more based on rhythms
    and pauses of music, not what we current recognize as grammatical
    units, words in particular.

    JM: Mark Baker's new book, _The Atoms of Language_, investigates the
    similarities across languages. What are the universal principles or
    units that tie all languages. Back to a rationalist, structuralist
    mode.

    SR: yes, but if so, then what do we know? something about the
    brain? something about culture?

    JD: where does this fit in the program?

    JU: at least the historical information is useful, just so that our
    students have a more complete understanding of the field.

    GR: maybe one of these readings should be supplemented with a piece
    on creating corpora. someone might want to develop a corpus of
    magazine advertisements for women's cosmetics from 1950 - 1970. that
    would provide them with an introduction to the terms and concepts and
    give them something "real" to chew on.

    JD: what about nlp and visualization?

    SR: lots of people doing this?

    GR: very quickly get into mva and very complicated mathematics

    JM: huge gap here btwn natural language and text. what we need is a
    concept like discourse field. what we have are quantum hunks: tokens
    in a condition of uncertainty. there are some unidentified but
    cogent connections between the countable elements.

    SR: the goals of many of these people--the nlp people--is rather
    practical, instrumental, to create devices that allow people to speak
    commands--"lights, turn off" different goals.

    JU: one question: what's worth counting here?

    not sure that all of this is so different than textual analysis. if
    you're going to ask a system to markup a text, the system will have
    to be able to process the language.

    underlying theme of humanities computing--why do we keep trying to
    make computers do things like people? why do we try to "humanize"
    the machines?

    GR: nlp between text processing and AI

    GR: turing test. questions about meaning and understanding. these
    are themes that should be returned to in the section on AI.

    why did tests of intelligence begin with testing conversational
    ability?

    maybe we should follow the path from Turing to Searle.

    this also connects to Phil's work with interface. many early
    interface studies started with Turing as well.

    JM: is sgml automation or augmentation?

    GROUP: augmentation

    GR: augmentation. at the beginning, it probably was automation. but
    that failed. now we see markup as a method of augmenting the text.

    JU: SGML asks you to augment the text.

    SR: markup augments the text by enabling readers to see more
    patterns.

    GR: augmentation is seeing something you didn't expect to see. not
    necessarily the process of markup, then a procedure that can be
    layered on top that then shows you something about the marked up text
    (s) that you didn't already know

    JU: maybe we should include Richard Powers _Galatea 2.2_ in the
    curriculum



    This archive was generated by hypermail 2b30 : Wed Feb 27 2002 - 13:49:17 EST