Humanist Discussion Group, Vol. 14, No. 486. Centre for Computing in the Humanities, King's College London <http://www.princeton.edu/~mccarty/humanist/> <http://www.kcl.ac.uk/humanities/cch/humanist/> [1] From: Wendell Piez <wapiez@mulberrytech.com> (64) Subject: Re: 14.0482 XML, WWW, editions [2] From: "David Halsted" <halstedd@tcimet.net> (24) Subject: Re: 14.0482 XML, WWW, editions --[1]------------------------------------------------------------------ Date: Fri, 10 Nov 2000 09:49:31 +0000 From: Wendell Piez <wapiez@mulberrytech.com> Subject: Re: 14.0482 XML, WWW, editions Willard, Replying to letters from John Bradley, David Halsted, Fotis Jannidis, Francois Lachance.... I'd be surprised if Fotis and I are not in substantial agreement on the architectural questions, how we will see large editions (mega- or gigabytes) deployed on server vs. client. As for what ideas I have, actually I'd like to pass the ball back to John Bradley (and anyone else) to carry on that one, as they have more concrete ideas and hands-on experience. I agree with John that the web paradigm we have inherited, for better or worse, has tended to limit the options; for the kinds of things we want to do, even a university pipe may not be wide enough, to say nothing of those of us on 28.8. On the other hand, I also agree with John and David that various kinds of chunking/indexing/cross-indexing strategies are feasible, and will probably always be necessary in some cases. We will probably see the size of integrated resource collections (whether "editions" or not) grow along with available bandwidth, so the problem will never disappear even as the limits are pushed outwards. As to whether XML per se supports, or fails to support, such chunking (particularly if it is to be transparent to the user), I think it's safe to say it's neutral: a system could be designed either way (and either way might be appropriate in different circumstances). A key design issue here is the framing of metadata at various levels, and whatever "information inheritance" models are implemented to support the chunking while maintaining an integrated view (or: how are chunking and indexing to be best interrelated and managed?). In this context, for example, I think it's worthwhile to note that XSLT, the W3C transformation language supporting XML presentation, is designed specifically in order to support a kind of "random access" processing (my term, not a term of art for this to my knowledge), that is, "start styling the document from any point in the middle". If the language were not side-effect-free (one of the features of its processing model), one would have to download an entire document before one could style it, as (for example) the Panorama browser had to do. This not being necessary with XSL, the pipeline itself is not such a bottleneck clients are able to bear more load. I would also concur, however (more agreement here) with David and with Francois in his more outlandish post (that's a compliment, Francois!), both of whom suggest that the design questions here are really wide open, and that we'll be seeing many interesting experiments with peer-to-peer deployments, dynamic editions, etc. etc. This is really a brave new world. I wonder whether experiments with scholarly publishing in which many readers have capabilities (simultaneously?!) to amend or alter texts, have been done, how such texts can be framed and deployed, and what results we'll see. In some respects I think we'll find it's like the 1960s experiments with audience participation in theater: that it is vitalizing and enriching, and yet also in some ways, by threatening the formal integrity of the medium itself, a real risk: so every performance is either brilliant, or a complete bust. For us, it's the line between research and publication that blurs (as has been remarked on HUMANIST before), raising similar questions. But we will not be the only community facing this particular architecture/design problem, by any means. Think of Internet-based medical informatics, financial services.... we actually have quite a bit to contribute here, as David also says. Best regards, Wendell ====================================================================== Wendell Piez mailto:wapiez@mulberrytech.com Mulberry Technologies, Inc. http://www.mulberrytech.com 17 West Jefferson Street Direct Phone: 301/315-9635 Suite 207 Phone: 301/315-9631 Rockville, MD 20850 Fax: 301/315-8285 ---------------------------------------------------------------------- Mulberry Technologies: A Consultancy Specializing in SGML and XML ====================================================================== --[2]------------------------------------------------------------------ Date: Fri, 10 Nov 2000 09:50:21 +0000 From: "David Halsted" <halstedd@tcimet.net> Subject: Re: 14.0482 XML, WWW, editions The discussion of scholarly texts and the problems involved in making them useful online got me wanting to experiment, and I've written an extremely primitive SAX parser (based on Xerces) that reads a set of URLs in from an XML file and looks through all of the documents it finds for lines that match a string you feed in at the command line. It's invoked like this: java [-cp classpath] ShakesRead [urlsFile.xml] [string_to_find] It returns the name of the document, the line number at which the string was found, and the line in which the string was found (this version expects the content it's looking for to be in a <LINE></LINE> tag pair, but that could be made more useful). Nothing earth-shaking, but it is precisely a kind of client that looks for useful information in an arbitrarily large number of remote XMLs and tells you where that information is located. I don't know whether anybody would find such a thing useful, but I can imagine some potentially useful modifications, like allowing the program to take, say, a set of lines in a poem and look for each of the words used there in turn across different "libraries" of XMLs, grouped by author, period, genre, whatever. If anybody wants to try playing around with this program-let, I'd be pleased to share it, or I could put a version online somewhere if people are interested in trying it out that way. The main point was to argue that clients can already be used to take advantage of online XML, even if the documents in question are fairly large -- and also, a bit, that putting XML version of your favorite scholarly materials online is worthwhile; now people can really use it . . . Dave Halsted
This archive was generated by hypermail 2b30 : 11/10/00 EST