14.0486 XML, WWW, editions

From: by way of Willard McCarty (willard@lists.village.Virginia.EDU)
Date: 11/10/00
Next message: by way of Willard McCarty: "14.0487 free book: Accessing our humanities collections"
Previous message: by way of Willard McCarty: "14.0484 Arts and Humanities Online; Computers and Writing Online 2001"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
               Humanist Discussion Group, Vol. 14, No. 486.
       Centre for Computing in the Humanities, King's College London
               <http://www.princeton.edu/~mccarty/humanist/>
              <http://www.kcl.ac.uk/humanities/cch/humanist/>

   [1]   From:    Wendell Piez <wapiez@mulberrytech.com>              (64)
         Subject: Re: 14.0482 XML, WWW, editions

   [2]   From:    "David Halsted" <halstedd@tcimet.net>               (24)
         Subject: Re: 14.0482 XML, WWW, editions


--[1]------------------------------------------------------------------
         Date: Fri, 10 Nov 2000 09:49:31 +0000
         From: Wendell Piez <wapiez@mulberrytech.com>
         Subject: Re: 14.0482 XML, WWW, editions

Willard,

Replying to letters from John Bradley, David Halsted, Fotis Jannidis,
Francois Lachance....

I'd be surprised if Fotis and I are not in substantial agreement on the
architectural questions, how we will see large editions (mega- or
gigabytes) deployed on server vs. client. As for what ideas I have,
actually I'd like to pass the ball back to John Bradley (and anyone else)
to carry on that one, as they have more concrete ideas and hands-on experience.

I agree with John that the web paradigm we have inherited, for better or
worse, has tended to limit the options; for the kinds of things we want to
do, even a university pipe may not be wide enough, to say nothing of those
of us on 28.8. On the other hand, I also agree with John and David that
various kinds of chunking/indexing/cross-indexing strategies are feasible,
and will probably always be necessary in some cases. We will probably see
the size of integrated resource collections (whether "editions" or not)
grow along with available bandwidth, so the problem will never disappear
even as the limits are pushed outwards. As to whether XML per se supports,
or fails to support, such chunking (particularly if it is to be transparent
to the user), I think it's safe to say it's neutral: a system could be
designed either way (and either way might be appropriate in different
circumstances). A key design issue here is the framing of metadata at
various levels, and whatever "information inheritance" models are
implemented to support the chunking while maintaining an integrated view
(or: how are chunking and indexing to be best interrelated and managed?).

In this context, for example, I think it's worthwhile to note that XSLT,
the W3C transformation language supporting XML presentation, is designed
specifically in order to support a kind of "random access" processing (my
term, not a term of art for this to my knowledge), that is, "start styling
the document from any point in the middle". If the language were not
side-effect-free (one of the features of its processing model), one would
have to download an entire document before one could style it, as (for
example) the Panorama browser had to do. This not being necessary with XSL,
the pipeline itself is not such a bottleneck clients are able to bear more
load.

I would also concur, however (more agreement here) with David and with
Francois in his more outlandish post (that's a compliment, Francois!), both
of whom suggest that the design questions here are really wide open, and
that we'll be seeing many interesting experiments with peer-to-peer
deployments, dynamic editions, etc. etc. This is really a brave new world.
I wonder whether experiments with scholarly publishing in which many
readers have capabilities (simultaneously?!) to amend or alter texts, have
been done, how such texts can be framed and deployed, and what results
we'll see. In some respects I think we'll find it's like the 1960s
experiments with audience participation in theater: that it is vitalizing
and enriching, and yet also in some ways, by threatening the formal
integrity of the medium itself, a real risk: so every performance is either
brilliant, or a complete bust. For us, it's the line between research and
publication that blurs (as has been remarked on HUMANIST before), raising
similar questions.

But we will not be the only community facing this particular
architecture/design problem, by any means. Think of Internet-based medical
informatics, financial services.... we actually have quite a bit to
contribute here, as David also says.

Best regards,
Wendell

======================================================================
Wendell Piez                            mailto:wapiez@mulberrytech.com
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
    Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================




--[2]------------------------------------------------------------------
         Date: Fri, 10 Nov 2000 09:50:21 +0000
         From: "David Halsted" <halstedd@tcimet.net>
         Subject: Re: 14.0482 XML, WWW, editions

The discussion of scholarly texts and the problems involved in making them
useful online got me wanting to experiment, and I've written an extremely
primitive SAX parser (based on Xerces) that reads a set of URLs in from an
XML file and looks through all of the documents it finds for lines that
match a string you feed in at the command line.  It's invoked like this:

java [-cp classpath] ShakesRead [urlsFile.xml] [string_to_find]

It returns the name of the document, the line number at which the string was
found, and the line in which the string was found (this version expects the
content it's looking for to be in a <LINE></LINE> tag pair, but that could
be made more useful).  Nothing earth-shaking, but it is precisely a kind of
client that looks for useful information in an arbitrarily large number of
remote XMLs and tells you where that information is located.  I don't know
whether anybody would find such a thing useful, but I can imagine some
potentially useful modifications, like allowing the program to take, say, a
set of lines in a poem and look for each of the words used there in turn
across different "libraries" of XMLs, grouped by author, period, genre,
whatever.

If anybody wants to try playing around with this program-let, I'd be pleased
to share it, or I could put a version online somewhere if people are
interested in trying it out that way.  The main point was to argue that
clients can already be used to take advantage of online XML, even if the
documents in question are fairly large -- and also, a bit, that putting XML
version of your favorite scholarly materials online is worthwhile; now
people can really use it . . .

Dave Halsted
Next message: by way of Willard McCarty: "14.0487 free book: Accessing our humanities collections"
Previous message: by way of Willard McCarty: "14.0484 Arts and Humanities Online; Computers and Writing Online 2001"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
This archive was generated by hypermail 2b30 : 11/10/00 EST