dhcs minutes: 10/17

From: Andrea K. Laue (akl3s@cms.mail.virginia.edu)
Date: Thu Oct 18 2001 - 09:44:07 EDT

  • Next message: Andrea K. Laue: "dhcs: stand-off markup"

    Business:

    AL: Sowa, Bringsjord and Kirschenbaum have all confirmed. Trevor Harris
    supposedly will. We can invite one more person. Who?

    Proposed Speakers / Topic

    Kathy Ball -- Georgetown, nlp <http://www.georgetown.edu/cball/cball.html>
    Niels Finnemann -- history of computer/social impacts
    Lev Manovich -- new media aesthetics, <http://www.manovich.net/index.html>
    Martha Blodgett -- information communities in library context, UVa
    Kathy Ryall -- Mitsubishi Lab, interface,
    <http://www.merl.com/people/ryall/>
    Jim French -- navigation and information retrieval, UVa
    <http://www.cs.virginia.edu/~french/>
    Dave Luebke -- graphics, UVa
    <http://www.cs.virginia.edu/brochure/profs/luebke.html>
    Dave Brogan -- graphics, UVa
    <http://www.cs.virginia.edu/brochure/profs/brogan.html>
    Aldona Towner -- is there an nlp person at IBM?
    Ben Schneiderman -- information visualization
    <http://www.cs.umd.edu/~ben/>
    Bill Buxton -- interface <http://www.billbuxton.com/>
    David Noble -- electronic communities, coming here anyway in November,
    maybe just sit-in

    Topic: Data Structures
    Leader(s): John Unsworth and Daniel Pitti

    JU: What we're not talking about today: not talking about very basic data
    structures that computers use to manage data--lists, arrays, hash tables,
    etc.; character sets (?);

    DP: Usually use the term structured information rather than data
    structures.

    AL: To what extent to we want to step back from actual implementations--to
    talk about classification systems, ontologies, logic, etc before we talk
    about DTD's and databases.

    JM: Historicization is important. We need to talk about classification
    systems from Aristotle on . . . It's a misnomer to state that we're
    talking about structured information; really, we're talking about
    re-structured information.

    JD: How are we to understand the development of database structures in the
    history of set theory? Relational databases first appeared in the
    late-60's and early-70's. Why is this?

    JU: The structures we're talking about assume important things about the
    data: hierarchy, atomical units, relations.

    GR: Would you consider a character set a data structure?

    DP: Computers weren't really interesting to humanists until computers were
    able to name things--to specify their own semantics. To name things and
    to specify structures. Two technologies most frequently used--databases
    and markup languages.

    Not talking about visually or aurally structured information here, just
    text.

    Markup. The _Gentle Introduction_ seems quite ancient now. Written about
    SGML without any knowledge of XML; talks about many things which have now
    been eliminated in XML.

    Email between Allen Renear and Jerry McGann.

    Encoding is an interpretation. A DTD could not be written to capture all
    aspects of a text.

    Important distinction: procedural vs. declarative markup

    The _Gentle_ doesn't delve into the philosophical issues.

    Primary assumption that differentiates markup from relational databases:
    hierarchy. Markup assumes that text is inherently hierarchical.

    JM: Could you say something about standoff markup?

    JD: What kinds of information are appropriate for markup, for databases,
    and for a third category of structures?

    RD: Lets take a step back and talk about theories of representation. It's
    just as important to talk about what's not represented in any structure.
    Let's look at a wider range of knowledge representation, theories of this
    beyond applications in the computer.

    DP: The introduction of computers forces us to concentrate on sturctures
    that are processable.

    What types of information best fit our two tools, databases and markup?

    Database:
    info. w/ repeating patterns

    Markup Language:
    ordered information--is the order of the objects important

    WM: What is the "is" of text? Renear says that "text is." How well
    accepted is that?

    WM: Are we relativists or positivists? Are we theorists or practitioners?

    AL: I would like to suggest that we first attempt to be theoreticians.
    And then second, practitioners.

    JU: I suggest that we work first as practictioners and second as
    theoreticians. We should start with constraints.

    TH: Fundamental conflict: representing things for what they are vs.
    representing things for a purpose.

    JU: Ask students to model their families as XML, object-oriented database,
    relational database.

    JD: classification aspect is one thing we're talking about here, but
    aren't we also talking about metalanguages. How do we talk about the
    languages that we are using.

    JU: What do you mean by metalanguage? Grammars? Semantics?

    JD: We should be aware of what it means to talk about a system of
    representation as a system of representation. These tools assume that
    we're making a system of representations about a system of
    representations. Should we talk about meta structures and what they
    "mean"?

    Certain ways of assuming that you enter data into a database or a markup
    language. How do we describe the way we use this?

    JU: Rules of integrity. DTD--you must parse a document according to rules
    of integrity. Relational database--relational algebra enforces rules of
    integrity. Object-oriented databases don't enforce rules of integrity.

    What's the syntax here? And where are the semantics of the syntax
    expressed.

    In object-oriented systems, perhaps you hide the syntax in the methods.
    You don't declare at the beginning very many rules. There's no way to
    tell if the objects are internally consistent.

    GR: Process of structuring. How do you go about trying to find or force
    structure on data. There is a moment when you're trying to discover
    structure when exploratory markup is very helpful. That is when you're
    still trying to find the structure. Markup languages like CoCa. (TACT
    uses CoCa.) Developed by Susan Hockey. Used in linguistics.

    TH: CoCa was a late '70's markup language that was developed on a
    particular computer.

    JM: Has a brief description of it in her book.

    GR: Key philosophical thing. "1" is true of "act" until you get to "2."

    <act 1> xxx xxx xxx xxx <act 2> xxx xxx xxx

    SR: Suspicious of distinction between procedural and descriptive.

    JU: Procedures are, in fact, descriptive. But only implicitly
    descriptive.

    SR: XML is a procedural language that doesn't have its procedures defined
    yet.

    DP: Much of what people actually markup are implicitly procedures. You
    want someting to happen to the text.

    TH: You do this with a purpose. You want to call a procedure at the last
    minute, maybe, but you do have an intention.

    DP: 125 different occasions for which italics were used in the OED.
    Should we make 125 tags?

    SR: Almost the same as saying that typography is not semantic. Or layout
    is not semantic. That's just wrong.

    DP: Maybe distinction should be noun vs. verb.

    GR: This method--CoCa--describes a very humanities oriented-method of
    processing the text. XML encourages another, a distanced evaluation of
    the entire structure before entering into it. CoCa describes a linear
    method of reading. It's very hard to do document analysis in a
    hierarchical manner. CoCa is serial.

    TDBSGML -- with a lookup table, you can take a tact database and convert
    to a TEI-lite document.

    JU: Two questions. milestone vs. continuation of truth until stated
    otherwise

    GR: Instead of "closing" tags, you change the value of the variable to
    "off" or "stop."

    Good exploratory markup language doesn't enforce hierarchy from the
    beginning.

    JU: Is this pointing to a different between a deductive and an inductive
    approach. The inductive approach would use exploratory markup.

    GR: How do we do document analysis as a practice?

    JD: Pedagogical methods. CoCa might be a very good pedagogical tool.
    Doesn't require a pre-existing DTD.

    JU: This points to how comfortable we are about some positivist notions of
    text. We laugh about turning metaphor "off," but yet we have a sense of
    metaphor as a bounded thing.

    What differnt models or tools might be more appropriate at differnt
    moments of analysis, different places in the process.

    Or maybe have students markup a text, feed to "Fred" or other agent that
    will return the DTD implicit, and see how you structured.

    RD: Look at theories of text. Then have students try to markup texts
    according to these different theories.

    JU: In February we'll have Paul Eggert visit and talk about
    JIT--just-in-time markup.

    Email URL about this.
    <http://idun.itsc.adfa.edu.au/ASEC/PWB_REPORT/choice.html>



    This archive was generated by hypermail 2b30 : Thu Oct 18 2001 - 09:44:14 EDT