Humanist Discussion Group, Vol. 14, No. 277. Centre for Computing in the Humanities, King's College London <http://www.princeton.edu/~mccarty/humanist/> <http://www.kcl.ac.uk/humanities/cch/humanist/> [1] From: Wendell Piez <wapiez@mulberrytech.com> (125) Subject: Re: 14.0272 methodological primitives [2] From: "Ian Lancashire" <ian@chass.utoronto.ca> (89) Subject: Re: 14.0272 methodological primitives [3] From: "Osher Doctorow" <osher@ix.netcom.com> (42) Subject: Re: Methodological primitives --[1]------------------------------------------------------------------ Date: Wed, 27 Sep 2000 09:34:28 +0100 From: Wendell Piez <wapiez@mulberrytech.com> Subject: Re: 14.0272 methodological primitives Hi Willard and HUMANIST: At 07:27 AM 9/26/00 +0100, John Bradley wrote: .... >In Object Oriented (OO) design, there is a another way to design >processing which is these days very much in fashion. One perhaps key >difference: Object Oriented design blurs the distinction Willard made >in his first posting on this subject between data and process, and I >think this makes a dramatic difference in the way one looks at the >whole issue. It seems particularly well suited for modelling >processes that involve the production of "interactive" and >"GUI-based" systems. I don't know of anyone, however, who has managed >to take OO design and apply it in quite the way implied here -- as a >basis for the construction of primitives that non-programmers could >adapt for specific tasks. This is fascinating stuff. John's point about the underlying assumption in OO design -- to merge the conception, in modeling, of data and process, is very well taken. It's especially interesting in this context because as these systems evolve, naturally, the old ideas and approaches come up time and again. In the context of OO (especially, say, Java, with its promise of portability and the long-term robustness that comes with platform-independence), we see the pendulum swing back again with the emergence of markup-based (specifically XML-based) systems. A key reason OO approaches work well for interactive GUIs and other process-intensive work is, in fact, that even while they can support strongly encapsulated architectures (more easily modified and maintained) OO programs can take shortcuts to achieve functionality, at the price of locking in their data to a particular data model (and hence, usually, a particular format). But who wants to be storing their conference papers as Java objects? Of course, the next step is to abstract and formalize a portable data model outside the implementation, pulling data back away from process (at least to whatever extent is possible). By providing a standard syntax supporting off-the-shelf tools, XML eases this work greatly. In the business, we've used the analogy, "if Java is a way to build your toaster, then XML is sliced bread." This tries to identify a key advantage of a standards-based markup syntax: that, in theory at least (and increasingly in practice), it should now be possible to use OO languages to work in the way they want -- with sophisticated data models (not merely streams of characters) -- and yet not lock our data into the specific processing environment we happen to be using at the moment. >Any tool meant to support activities as diverse as those that turn up >in humanities text-based computing cannot possibly be trivial to >learn or use. The level of professionalism and commitment required >for a full use of TuStep is, I think, roughly comparable to that >required to learn to work with, say, Perl, or (I think) Smalltalk and >text-oriented Smalltalk objects. I think that's fair, since any toolset whose native data set is a file containing a stream of characters, must work on that basis, inferring more complex data structures where it can (by parsing), but not assuming in the general case that those particular data structures are there in that form. For one thing, in Humanities Computing as it stands, it's fair to assume they're not. In order to build a more "intuitive" system (say, a GUI-driven system allowing on-the-fly manipulation of texts), a more sophisticated data model needs to be assumed that can support more complex operations in a generalized way. To go about "sorting entries in Swedish lexicon order" or "sorting entries in Icelandic name order": the system has to know both what these orders are, and what an "entry" is. XML, by providing for a particular kind of tree-structure, is beginning to provide at least an infrastructure within which such knowledge is embedded, so we can now begin to use standard syntaxes such as XPath (co-designed by a Computing Humanist, Steve DeRose) for some of this. (XPath can't sort, but it can do some other fancy stuff such as filter by content, so that '//line[contains(., "To be")]' will return all <line> elements in a document that contain the string "To be".) Consequently, we are beginning to see some of these capabilities emerging as XML tools. For example, Sun Microsystems has an "EAI" product (the TLA stands for "Enterprise Application Integration") called Forte Fusion (that 'e' has an accent mark that I don't trust your mailer to render) that allows a user to set up a data process flow chain in which an XML data set can be passed through a series of processors, including, prominently, XSLT transformations that could be doing filtering, sorting, analytical work. The idea is that when you click on the form to submit your order for the new American Civil War battle game, your order can be parsed, and the Authorization, Shipping and Billing departments at NorthernAggression.com can all get the appropriate pieces of your order (some of which might already be in the system since you're a regular) in a timely way, following whatever internal logic is required (e.g. don't send the game out if your credit card bounces again). The whole thing works with a GUI: little icons represent your filtering and processing engines, with, as it were, a pipe carrying the data between them. The different engines can be disparate, running on different systems and platforms, a Unix server here running a batch program in Perl, an XSLT transform on a client over there, and so forth. But to build something like this, you have to have a fairly stable data model. (In this case, the system is going to do special things with your name, address, credit card number etc.) At this stage, it is too early to say when such a data model will be possible or feasible for the kind of analysis we want to do in Humanities Computing -- especially considering we commonly work at the level of the "word" (whatever that is), not just element types, and want access to orthographical variants, morphologies, synonyms, etc. etc., intelligence about all of which has to be stored somewhere in some sufficiently tractable (and long-lived) form. Not to mention the problem of sense-disambiguation (I love Prof. Ott's bit about the "content provider" becoming a "satisfied donor"). Our work with higher-level linguistic and literary structures has barely started. Also, to be an iconoclast about it, I am not sure it is our best course to move forward pell-mell in this direction, without being extremely critical of the task itself. Every lens comes with its blindness, and as we design these capabilities into systems, by deciding what we want to look at, we will also be deciding what we don't care to see. I am very much in favor of experimental work to design and deploy whatever higher-level structures we can discern, trace, render malleable with these powerful tools. But I also believe that great works of literature will continue to evade whatever structures we impose on them, just as they always have, it being the primary work of every poet to reinvent the art of poetry from scratch. And not only for ourselves should we be wary, but for the role we have to play in the larger world's understanding of its own rhetorics and how they work. It does little good to say when the Emperor has no clothes, if you haven't been taking care of your own wardrobe. So, while I'm not going to be quitting work myself on methodological primitives, I'm not confident that you're going to see them anytime soon in a form that a naive user, without knowledge of sordid details of text encoding, could simply sit down, tinker with and have instantly useful and trustworthy results. "Epiphany In a Box"? Which is a good thing. After all, isn't it our role to show the naive user what's *really* going on? Best regards, Wendell ====================================================================== Wendell Piez mailto:wapiez@mulberrytech.com Mulberry Technologies, Inc. http://www.mulberrytech.com 17 West Jefferson Street Direct Phone: 301/315-9635 Suite 207 Phone: 301/315-9631 Rockville, MD 20850 Fax: 301/315-8285 ---------------------------------------------------------------------- Mulberry Technologies: A Consultancy Specializing in SGML and XML ====================================================================== --[2]------------------------------------------------------------------ Date: Wed, 27 Sep 2000 09:35:10 +0100 From: "Ian Lancashire" <ian@chass.utoronto.ca> Subject: Re: 14.0272 methodological primitives The best set of text-based utilities can be found in UNIX, the next best in Dan Melamed's perl tools at http://www.cis.upenn.edu/~melamed/ . The 1980s Hum was a gem too. It still is. Susan Hockey, as usual, was prescient: she saw the need for someone in the humanities to learn basic programming and to assemble groups of these "primitives." Her fine book on Snobol programming enables the altruistic humanist still. Earlier, Nancy Ide published a book on Pascal for the humanities. (This isn't meant to be an exhaustive list ....) Maybe one of the unforeseen effects of relying on professional programmers to create big pieces of software like TACT and Wordcruncher is to encourage scholars in the humanities to believe that they can get along without being able to write small programs or adapt ones created by other people. (This too is a debate I have overheard intermittently over several decades.) Ott's comments on the impediments to releasing primitives that would satisfy all and sundry come from an expert programmer. The world of cybertext is too complex now. We will also never all agree on how, for whatever purpose, to symbolize the more fundamental primitives embedded in any programming language. Ian Lancashire Toronto ----- Original Message ----- From: Humanist Discussion Group <willard.mccarty@kcl.ac.uk>) <willard@lists.village.virginia.edu> To: Humanist Discussion Group <humanist@lists.Princeton.EDU> Sent: Tuesday, September 26, 2000 2:27 AM > > Humanist Discussion Group, Vol. 14, No. 272. > Centre for Computing in the Humanities, King's College London > <http://www.princeton.edu/~mccarty/humanist/> > <http://www.kcl.ac.uk/humanities/cch/humanist/> > > > > Date: Tue, 26 Sep 2000 07:16:14 +0100 > From: John Bradley <john.bradley@kcl.ac.uk> > Subject: Re: 14.0258 methodological primitives? > > Willard: I would certainly support anyone who took the view that > Wilhelm Ott's TuStep system provides a very solid set of "primitives" > for the scholarly manipulation of text. I have spent many hours of > time examining their design (although I confess that my actual > experience of using them has been very limited indeed) and can well > appreciate that they could be combined to deal with a very large > number of text manipulation needs. Anyone seriously interested in > thinking about what a design needs to include in detail would benefit > much from examining TuStep in this way. > > The approach towards tools for generalised processing shown in TuStep > is, from the computing perspective, a very old one -- but at the same > time it is a model that is still often applied when a computing > professional needs to do a complex computing task him/herself. The > UNIX environment with its basic "filtering" tools, a sorting > program, some programmable text-oriented editors, and things like > Perl, are based in very similar approaches. > > In Object Oriented (OO) design, there is a another way to design > processing which is these days very much in fashion. One perhaps key > difference: Object Oriented design blurs the distinction Willard made > in his first posting on this subject between data and process, and I > think this makes a dramatic difference in the way one looks at the > whole issue. It seems particularly well suited for modelling > processes that involve the production of "interactive" and > "GUI-based" systems. I don't know of anyone, however, who has managed > to take OO design and apply it in quite the way implied here -- as a > basis for the construction of primitives that non-programmers could > adapt for specific tasks. However, the original OO language -- > Smalltalk -- >was< designed to allow non-programmer users (children) > to create significant applications of their own, and it retains, I > think, some of this flavour of supporting the combination of > experiment, development and processing in a single environment. > Furthermore, I know of people who have a set of powerful objects (in > Smalltalk, it turns out) they use and enhance over and over again to > accomplish very sophisticated text manipulation tasks. > > Any tool meant to support activities as diverse as those that turn up > in humanities text-based computing cannot possibly be trivial to > learn or use. The level of professionalism and commitment required > for a full use of TuStep is, I think, roughly comparable to that > required to learn to work with, say, Perl, or (I think) Smalltalk and > text-oriented Smalltalk objects. > > Best wishes. ... john b > ---------------------- > John Bradley > john.bradley@kcl.ac.uk > > > > > --[3]------------------------------------------------------------------ Date: Wed, 27 Sep 2000 09:36:10 +0100 From: "Osher Doctorow" <osher@ix.netcom.com> Subject: Re: Methodological primitives My previous contribution on this topic may have been a bit obscure, so I will try a slightly different approach. My view is that whatever you are talking about, it is useless if you cannot make a Shakespearean play about it. On methodological primitives, I will for concreteness consider the special case of political history, which is far more concrete than it looks in a certain sense. I maintain that political history has 3 methodological primitives (mp's or mps for short), namely, anger, blame, and naivete/ignorance (naivete is I think the nice way of referring to ignorance). I propose a 3 actor, 6 act play to illustrate this (3 times 2 is 6, which is the number of permutations of 3 actors). For our actors/actresses, we will select any 3 characters from Shakespeare, and put labels on them, namely, A for anger, B for blame, and N for naivete/ignorance. To show the direction of influence or causation, we will have A point to B if A influences B, and so on, and we limit the play to 3-person or 3-party influence cases. Let me translate this play into an easier summary. Political history is composed of angry public A who elect or cause to have power political blamers B who blame ignorant or naive people N. It is also composed of naive/ignorant people N who elect or cause to have power politicians B who blame angry people A. It is also composed of angry politicians A who enable blamer B to seize power and thus start a war against ignorant/naive people N. Of course, blamers B can also elect naive/ignorant person N who starts a preventative war against angry people A. Alternatively, blamers B may decide to elect or give power to an angry psychopath or sociopath A who starts a preventative war against naive/ignorant people N. I think the trend here is becoming obvious. This seems to cover political history from prehistoric through modern times, with various permutations. Notice carefully that I have not yet introduced computers, even though this discussion group concerns humanist computation. That is because it has not yet reached the stage where it iinvolves too much work for people to keep track of or accomplish rapidly. I am trying to be parsimonious here and save time and money. Why spend money when you don't need to (remind me to include that among future methodological primitives)? I am quite sure, however, that at some stage computers will be called upon for their assistance. As we turn to more and more complex things than political history, I feel certain that computers will find themselves of use. If nothing else, they can keep track of the possibilities that we have eliminated. For example, Ovid's Metamorphoses cannot refer to political history since otherwise it would reduce to the above statements. There must be millions of literary works which are excluded by similar grounds, and computers are definitely required to keep track of those. Yours To Be Continued, Osher Doctorow
This archive was generated by hypermail 2b30 : 09/27/00 EDT