14.0277 methodological primitives

From: by way of Willard McCarty (willard@lists.village.Virginia.EDU)
Date: 09/27/00

  • Next message: by way of Willard McCarty: "14.0278 new on WWW: CogNews"

                   Humanist Discussion Group, Vol. 14, No. 277.
           Centre for Computing in the Humanities, King's College London
       [1]   From:    Wendell Piez <wapiez@mulberrytech.com>             (125)
             Subject: Re: 14.0272 methodological primitives
       [2]   From:    "Ian Lancashire" <ian@chass.utoronto.ca>            (89)
             Subject: Re: 14.0272 methodological primitives
       [3]   From:    "Osher Doctorow" <osher@ix.netcom.com>              (42)
             Subject: Re: Methodological primitives
             Date: Wed, 27 Sep 2000 09:34:28 +0100
             From: Wendell Piez <wapiez@mulberrytech.com>
             Subject: Re: 14.0272 methodological primitives
    Hi Willard and HUMANIST:
    At 07:27 AM 9/26/00 +0100, John Bradley wrote:
      >In Object Oriented (OO) design, there is a another way to design
      >processing which is these days very much in fashion.  One perhaps key
      >difference: Object Oriented design blurs the distinction Willard made
      >in his first posting on this subject between data and process, and I
      >think this makes a dramatic difference in the way one looks at the
      >whole issue. It seems particularly well suited for modelling
      >processes that involve the production of "interactive" and
      >"GUI-based" systems. I don't know of anyone, however, who has managed
      >to take OO design and apply it in quite the way implied here -- as a
      >basis for the construction of primitives that non-programmers could
      >adapt for specific tasks.
    This is fascinating stuff. John's point about the underlying assumption in
    OO design -- to merge the conception, in modeling, of data and process, is
    very well taken. It's especially interesting in this context because as
    these systems evolve, naturally, the old ideas and approaches come up time
    and again. In the context of OO (especially, say, Java, with its promise of
    portability and the long-term robustness that comes with
    platform-independence), we see the pendulum swing back again with the
    emergence of markup-based (specifically XML-based) systems.
    A key reason OO approaches work well for interactive GUIs and other
    process-intensive work is, in fact, that even while they can support
    strongly encapsulated architectures (more easily modified and maintained)
    OO programs can take shortcuts to achieve functionality, at the price of
    locking in their data to a particular data model (and hence, usually, a
    particular format). But who wants to be storing their conference papers as
    Java objects? Of course, the next step is to abstract and formalize a
    portable data model outside the implementation, pulling data back away from
    process (at least to whatever extent is possible). By providing a standard
    syntax supporting off-the-shelf tools, XML eases this work greatly.
    In the business, we've used the analogy, "if Java is a way to build your
    toaster, then XML is sliced bread." This tries to identify a key advantage
    of a standards-based markup syntax: that, in theory at least (and
    increasingly in practice), it should now be possible to use OO languages to
    work in the way they want -- with sophisticated data models (not merely
    streams of characters) -- and yet not lock our data into the specific
    processing environment we happen to be using at the moment.
      >Any tool meant to support activities as diverse as those that turn up
      >in humanities text-based computing cannot possibly be trivial to
      >learn or use. The level of professionalism and commitment required
      >for a full use of TuStep is, I think, roughly comparable to that
      >required to learn to work with, say, Perl, or (I think) Smalltalk and
      >text-oriented Smalltalk objects.
    I think that's fair, since any toolset whose native data set is a file
    containing a stream of characters, must work on that basis, inferring more
    complex data structures where it can (by parsing), but not assuming in the
    general case that those particular data structures are there in that form.
    For one thing, in Humanities Computing as it stands, it's fair to assume
    they're not.
    In order to build a more "intuitive" system (say, a GUI-driven system
    allowing on-the-fly manipulation of texts), a more sophisticated data model
    needs to be assumed that can support more complex operations in a
    generalized way. To go about "sorting entries in Swedish lexicon order" or
    "sorting entries in Icelandic name order": the system has to know both what
    these orders are, and what an "entry" is. XML, by providing for a
    particular kind of tree-structure, is beginning to provide at least an
    infrastructure within which such knowledge is embedded, so we can now begin
    to use standard syntaxes such as XPath (co-designed by a Computing
    Humanist, Steve DeRose) for some of this. (XPath can't sort, but it can do
    some other fancy stuff such as filter by content, so that
    '//line[contains(., "To be")]' will return all <line> elements in a
    document that contain the string "To be".) Consequently, we are beginning
    to see some of these capabilities emerging as XML tools.
    For example, Sun Microsystems has an "EAI" product (the TLA stands for
    "Enterprise Application Integration") called Forte Fusion (that 'e' has an
    accent mark that I don't trust your mailer to render) that allows a user to
    set up a data process flow chain in which an XML data set can be passed
    through a series of processors, including, prominently, XSLT
    transformations that could be doing filtering, sorting, analytical work.
    The idea is that when you click on the form to submit your order for the
    new American Civil War battle game, your order can be parsed, and the
    Authorization, Shipping and Billing departments at NorthernAggression.com
    can all get the appropriate pieces of your order (some of which might
    already be in the system since you're a regular) in a timely way, following
    whatever internal logic is required (e.g. don't send the game out if your
    credit card bounces again). The whole thing works with a GUI: little icons
    represent your filtering and processing engines, with, as it were, a pipe
    carrying the data between them. The different engines can be disparate,
    running on different systems and platforms, a Unix server here running a
    batch program in Perl, an XSLT transform on a client over there, and so forth.
    But to build something like this, you have to have a fairly stable data
    model. (In this case, the system is going to do special things with your
    name, address, credit card number etc.) At this stage, it is too early to
    say when such a data model will be possible or feasible for the kind of
    analysis we want to do in Humanities Computing -- especially considering we
    commonly work at the level of the "word" (whatever that is), not just
    element types, and want access to orthographical variants, morphologies,
    synonyms, etc. etc., intelligence about all of which has to be stored
    somewhere in some sufficiently tractable (and long-lived) form. Not to
    mention the problem of sense-disambiguation (I love Prof. Ott's bit about
    the "content provider" becoming a "satisfied donor"). Our work with
    higher-level linguistic and literary structures has barely started.
    Also, to be an iconoclast about it, I am not sure it is our best course to
    move forward pell-mell in this direction, without being extremely critical
    of the task itself. Every lens comes with its blindness, and as we design
    these capabilities into systems, by deciding what we want to look at, we
    will also be deciding what we don't care to see. I am very much in favor of
    experimental work to design and deploy whatever higher-level structures we
    can discern, trace, render malleable with these powerful tools. But I also
    believe that great works of literature will continue to evade whatever
    structures we impose on them, just as they always have, it being the
    primary work of every poet to reinvent the art of poetry from scratch.
    And not only for ourselves should we be wary, but for the role we have to
    play in the larger world's understanding of its own rhetorics and how they
    work. It does little good to say when the Emperor has no clothes, if you
    haven't been taking care of your own wardrobe.
    So, while I'm not going to be quitting work myself on methodological
    primitives, I'm not confident that you're going to see them anytime soon in
    a form that a naive user, without knowledge of sordid details of text
    encoding, could simply sit down, tinker with and have instantly useful and
    trustworthy results. "Epiphany In a Box"? Which is a good thing. After all,
    isn't it our role to show the naive user what's *really* going on?
    Best regards,
    Wendell Piez                            mailto:wapiez@mulberrytech.com
    Mulberry Technologies, Inc.                http://www.mulberrytech.com
    17 West Jefferson Street                    Direct Phone: 301/315-9635
    Suite 207                                          Phone: 301/315-9631
    Rockville, MD  20850                                 Fax: 301/315-8285
        Mulberry Technologies: A Consultancy Specializing in SGML and XML
             Date: Wed, 27 Sep 2000 09:35:10 +0100
             From: "Ian Lancashire" <ian@chass.utoronto.ca>
             Subject: Re: 14.0272 methodological primitives
    The best set of text-based utilities can be found in UNIX, the next best in
    Dan Melamed's perl tools at http://www.cis.upenn.edu/~melamed/ . The 1980s
    Hum was a gem too. It still is. Susan Hockey, as usual, was prescient: she
    saw the need for someone in the humanities to learn basic programming and to
    assemble groups of these "primitives." Her fine book on Snobol programming
    enables the altruistic humanist still. Earlier, Nancy Ide published a book
    on Pascal for the humanities. (This isn't meant to be an exhaustive list
    Maybe one of the unforeseen effects of relying on professional programmers
    to create big pieces of software like TACT and Wordcruncher is to encourage
    scholars in the humanities to believe that they can get along without being
    able to write small programs or adapt ones created by other people. (This
    too is a debate I have overheard intermittently over several decades.)
    Ott's comments on the impediments to releasing primitives that would satisfy
    all and sundry come from an expert programmer. The world of cybertext is too
    complex now. We will also never all agree on how, for whatever purpose, to
    symbolize the more fundamental primitives embedded in any programming
    Ian Lancashire
    ----- Original Message -----
    From: Humanist Discussion Group
    <willard.mccarty@kcl.ac.uk>) <willard@lists.village.virginia.edu>
    To: Humanist Discussion Group <humanist@lists.Princeton.EDU>
    Sent: Tuesday, September 26, 2000 2:27 AM
      >                Humanist Discussion Group, Vol. 14, No. 272.
      >        Centre for Computing in the Humanities, King's College London
      >                <http://www.princeton.edu/~mccarty/humanist/>
      >               <http://www.kcl.ac.uk/humanities/cch/humanist/>
      >          Date: Tue, 26 Sep 2000 07:16:14 +0100
      >          From: John Bradley <john.bradley@kcl.ac.uk>
      >          Subject: Re: 14.0258 methodological primitives?
      > Willard: I would certainly support anyone who took the view that
      > Wilhelm Ott's TuStep system provides a very solid set of "primitives"
      > for the scholarly manipulation of text.  I have spent many hours of
      > time examining their design (although I confess that my actual
      > experience of using them has been very limited indeed) and can well
      > appreciate that they could be combined to deal with a very large
      > number of text manipulation needs.  Anyone seriously interested in
      > thinking about what a design needs to include in detail would benefit
      > much from examining TuStep in this way.
      > The approach towards tools for generalised processing shown in TuStep
      > is, from the computing perspective, a very old one -- but at the same
      > time it is a model that is still often applied when a computing
      > professional needs to do a complex computing task him/herself.  The
      > UNIX environment with its basic "filtering" tools, a sorting
      > program, some programmable text-oriented editors, and things like
      > Perl, are based in very similar approaches.
      > In Object Oriented (OO) design, there is a another way to design
      > processing which is these days very much in fashion.  One perhaps key
      > difference: Object Oriented design blurs the distinction Willard made
      > in his first posting on this subject between data and process, and I
      > think this makes a dramatic difference in the way one looks at the
      > whole issue. It seems particularly well suited for modelling
      > processes that involve the production of "interactive" and
      > "GUI-based" systems. I don't know of anyone, however, who has managed
      > to take OO design and apply it in quite the way implied here -- as a
      > basis for the construction of primitives that non-programmers could
      > adapt for specific tasks.  However, the original OO language --
      > Smalltalk -- >was< designed to allow non-programmer users (children)
      > to create significant applications of their own, and it retains, I
      > think, some of this flavour of supporting the combination of
      > experiment, development and processing in a single environment.
      > Furthermore, I know of people who have a set of powerful objects (in
      > Smalltalk, it turns out) they use and enhance over and over again to
      > accomplish very sophisticated text manipulation tasks.
      > Any tool meant to support activities as diverse as those that turn up
      > in humanities text-based computing cannot possibly be trivial to
      > learn or use. The level of professionalism and commitment required
      > for a full use of TuStep is, I think, roughly comparable to that
      > required to learn to work with, say, Perl, or (I think) Smalltalk and
      > text-oriented Smalltalk objects.
      > Best wishes.                        ... john b
      > ----------------------
      > John Bradley
      > john.bradley@kcl.ac.uk
             Date: Wed, 27 Sep 2000 09:36:10 +0100
             From: "Osher Doctorow" <osher@ix.netcom.com>
             Subject: Re: Methodological primitives
    My previous contribution on this topic may have been a bit obscure, so I
    will try a slightly different approach.  My view is that whatever you are
    talking about, it is useless if you cannot make a Shakespearean play about
    it.  On methodological primitives, I will for concreteness consider the
    special case of political history, which is far more concrete than it looks
    in a certain sense.  I maintain that political history has 3 methodological
    primitives (mp's or mps for short), namely, anger, blame, and
    naivete/ignorance (naivete is I think the nice way of referring to
    ignorance).  I propose a 3 actor, 6 act play to illustrate this (3 times 2
    is 6, which is the number of permutations of 3 actors).  For our
    actors/actresses, we will select any 3 characters from Shakespeare, and put
    labels on them, namely, A for anger, B for blame, and N for
    naivete/ignorance.   To show the direction of influence or causation, we
    will have A point to B if A influences B, and so on, and we limit the play
    to 3-person or 3-party influence cases.  Let me translate this play into an
    easier summary.  Political history is composed of angry public A who elect
    or cause to have power political blamers B who blame ignorant or naive
    people N.  It is also composed of naive/ignorant people N who elect or cause
    to have power politicians B who blame angry people A.  It is also composed
    of angry politicians A who enable blamer B to seize power and thus start a
    war against ignorant/naive people N.   Of course, blamers B can also elect
    naive/ignorant person N who starts a preventative war against angry people
    A.   Alternatively, blamers B may decide to elect or give power to an angry
    psychopath or sociopath A who starts a preventative war against
    naive/ignorant people N.   I think the trend here is becoming obvious.  This
    seems to cover political history from prehistoric through modern times, with
    various permutations.
    Notice carefully that I have not yet introduced computers, even though this
    discussion group concerns humanist computation.   That is because it has not
    yet reached the stage where it iinvolves too much work for people to keep
    track of or accomplish rapidly.  I am trying to be parsimonious here and
    save time and money.  Why spend money when you don't need to (remind me to
    include that among future methodological primitives)?    I am quite sure,
    however, that at some stage computers will be called upon for their
    assistance.  As we turn to more and more complex things than political
    history, I feel certain that computers will find themselves of use.  If
    nothing else, they can keep track of the possibilities that we have
    eliminated.  For example, Ovid's Metamorphoses cannot refer to political
    history since otherwise it would reduce to the above statements.  There must
    be millions of literary works which are excluded by similar grounds, and
    computers are definitely required to keep track of those.
    Yours To Be Continued,
    Osher Doctorow

    This archive was generated by hypermail 2b30 : 09/27/00 EDT