14.0482 XML, WWW, editions

From: by way of Willard McCarty (willard@lists.village.Virginia.EDU)
Date: 11/08/00

  • Next message: by way of Willard McCarty: "14.0475 items from Scout Report"

                   Humanist Discussion Group, Vol. 14, No. 482.
           Centre for Computing in the Humanities, King's College London
                   <http://www.princeton.edu/~mccarty/humanist/>
                  <http://www.kcl.ac.uk/humanities/cch/humanist/>
    
       [1]   From:    John Bradley <john.bradley@kcl.ac.uk>               (56)
             Subject: Re: 14.0469 XML & WWW; XML references; a broader
                     question
    
       [2]   From:    "David Halsted" <halstedd@tcimet.net>               (29)
             Subject: Re: 14.0469 XML & WWW
    
       [3]   From:    "Fotis Jannidis" <fotis.jannidis@lrz.uni-           (22)
                     muenchen.de>
             Subject: Re: 14.0469 XML & WWW
    
       [4]   From:    lachance@chass.utoronto.ca (Francois Lachance)      (42)
             Subject: imprint, edition, publication
    
    
    --[1]------------------------------------------------------------------
             Date: Wed, 08 Nov 2000 09:39:03 +0000
             From: John Bradley <john.bradley@kcl.ac.uk>
             Subject: Re: 14.0469 XML & WWW; XML references; a broader question
    
    
       >btw, I don't think that xml aware clients will be the solution for this
       >problem, because of the size of the editions.
    
    Wendell: I also share the view that Fotis is expressing here,
    although I must say that I have had so little time to do serious work
    in this area that I'm not sure my opinions should count for TOO much
    these days!
    
    Nonetheless, it seems to me that the WWW (and also much of the
    development work at W3C) is predicated on the unspoken assumption
    that the amount of data to be exchanged between server and client is
    relatively small.  This model may be fine for the kind of
    transational-oriented B2B applications that seem to be driving
    developments these days.  However, it appears to be a serious problem
    when looking at the scholarly use of texts.  I recall the first time
    this observation struck me -- several years ago when I went to the
    text archive site at University of Virginia (or was it Michigan?) and
    fetched their relatively-lightly marked up SGML-TEI documents using
    (as I recall) Panorama.  By the nature of the web access, and the
    "document-oriented" nature of SGML, (and, to be fair, perhaps the way
    that Panorama worked then) I had to fetch the entire document before
    seeing it.  It took a very long time -- as I recall about 30 minutes
    (this was when I was still at U of Toronto) -- before I saw anything
    of the document at all.  Suppose that instead of looking (merely
    trying to read!) a novel by Dickens I had been trying to do some
    analysis on all of Dickens' works.  The slowness would have been only
    one of the problems.  At the time it seemed to me that this approach
    -- shipping the entire document in a single gulp over the Internet
    before anything could be done with it -- was not going to gain wide
    acceptance for material of this kind.  The HTML representation of the
    same material was easier to handle because it had been split up into
    chunks -- but it seems to me that for scholarly use of text at least
    this chunking (except for straightforward reading on screen or
    printing out) was unfortunate at least, and, of course, the only
    markup one had to work with was HTML.
    
    It might be possible to divide the document into chunks for XML
    processing as well, although (it seems to me at least) by the nature
    of the way that SGML and XML work, the chunked version becomes at
    least in some sense different from the unchunked one when split in
    separate pieces.  I know, of course, that XPointer links can be made
    between separate documents, and someday widely available software
    will be able to deal with them -- but the chunking of materials into
    separate XML documents, not just the linking between them, is, I
    think, undesirable.  This becomes more and more of an issue when the
    amount of text in the document becomes larger, and the links between
    different parts (the thereby implied kind of processing one might
    want to do on those links) more intricate -- think about analysing
    text chunks that cross the boundaries between chunks provided by the
    electronic publisher, for example.
    
    You may recall that I raised this problem at my presentation at
    Virginia, and proposed there an architectural model that is XML based
    but is not based on the HTTP-WWW document chunking model. Whether it
    is any good or not, of course, would require me to develop it
    further!
    
    All the best.                                 ... john b
    ----------------------
    John Bradley
    john.bradley@kcl.ac.uk
    
    
    
    
    --[2]------------------------------------------------------------------
             Date: Wed, 08 Nov 2000 09:39:56 +0000
             From: "David Halsted" <halstedd@tcimet.net>
             Subject: Re: 14.0469 XML & WWW
    
    
    Edition size could be addressed in a number of ways.  It's true that it's
    probably not useful to think of individual desktops chunking through a large
    number of very large XMLs retrieved on the fly from remote machines, but it
    might be possible to think of, say, individual servers indexing a group of
    XML documents that are actually "stored" on other servers and making the
    index available for a set of users with shared interests.  In addition,
    sites with lots of XML behind them could make useful drill-downs available
    to users as well, and expose the results in XML.  So you could have a very
    nice set of mixed modes; sites with lots of XML could use server-side tools
    (including databases) to optimize searching, but could also expose the XML
    data stores, enabling anybody with enough machine to run their own searches
    against the data.  Users finding the site-provided tools inadequate could
    beef their RAM and manipulate the data themselves to meet their own needs;
    in fact, those users could expose the results of their research as XML and
    enable the original store to provide a link to their results.  Depending on
    the field, the results might become part of the underlying data store or
    simply build a searchable interpretive layer on top of the raw data.
    
    Eventually, we get to move beyond thinking about servers and clients, to
    thinking about severs talking to servers and people sort of "peeking in" to
    the data, asking the servers to provide the information they want from a
    connected series of other servers with data exposed in XML, that is,
    publicly queryable.  It'd be nice to see Humanities computing develop some
    things here; texts and published research can be public in a way that
    corporate data can't, so perhaps the true potential in distributed XML
    models can be realized more quickly with online Humanities computing.
    
    Dave Halsted
    
    ***
    David G. Halsted, PhD
    Consultant (XML/RDBMS/Web apps)
    halstedd@tcimet.net
    
    --[3]------------------------------------------------------------------
             Date: Wed, 08 Nov 2000 09:40:30 +0000
             From: "Fotis Jannidis" <fotis.jannidis@lrz.uni-muenchen.de>
             Subject: Re: 14.0469 XML & WWW
    
    From:    Wendell Piez <wapiez@mulberrytech.com>              (18)
    
      > How large do you expect these editions to be?
    
    What we have now are electronic editions with some megabyte. To
    give an example: the rather small edition "Der junge Goethe in seiner
    Zeit" (young goethe in his time) has about 35 MB. But this will grow
    quickly and I expect editions on one server to have some gigabytes
    in 10-20 years. I am not talking about commercial editions like the
    ones offered by Chadwyck-Healey, because they can solve these
    interoperability problems within their company, but about editions
    put on the net by the scholars who created them.
    
      > Why would server-side
      > processing be better for large editions?
    
    At the moment: because the browsers can't offer any kind of
    processing which would be useful to solve this problem. In the
    future: Probably there will be a division of labor between xml
    browsers and server. It would make our work easier if we agree
    early upon a common solution.
    
    
      > Or possibly I mistake you. If you mean to say XML-aware clients will not be
      > the *entire* solution to the problem, I agree.
    
    Yes, that is exactly what I wanted to say.
    But your question sounds to me like you have some ideas how to
    handle these problems. I am very interested in any ideas.
    
    Fotis Jannidis
    
    --[4]------------------------------------------------------------------
             Date: Wed, 08 Nov 2000 09:41:42 +0000
             From: lachance@chass.utoronto.ca (Francois Lachance)
             Subject: imprint, edition, publication
    
    Patrick,
    
    How would your argument about the openendedness of electronic editions
    work if the volatility of texts were a consequence of social practices
    and less so of technologically determined paradigms? (The question is
    of course moot if you consider "paradigms" to be expressions of social
    practice.)
    
    I am just a little wary of a quasi-ahistorical assertion of a single
    monolithic "print-medium paradigm of publication". And so I like to
    generalize in a most grandiose fashion:
    
    All texts are volatile. Electronic distribution may actually help
    preserve the variants that contribute to the creation of an edition.
    
    The vapours are captured in many media. Paper plus voice plus screen
    contribute to preservation of variation.
    
    A consideration of multimedia and audiovisual components of textual
    expression certainly challenges the often dichotomous crypto-mcluhanesque
    debate over print versus electronic.
    
    If an edition is a set of readings of records of performances, by its very
    matricial structure it is not only a gathering of what was witnessed but
    also an index of what might have been. Whatever the medium in which it is
    expressed, an edition contains a certain amount of conjecture. And it is
    the opening of an edition's working hypotheses to testing that contribute
    to its incompleteness (in the sense of possible world semantics) --- not
    the medium in which the expression of those working hypotheses are fixed.
    
    I just wonder how the link between systems of distribution and authorial
    control is any different for the written word, the spoken word, the film,
    the song, the symphony, the painting either hung in a gallery or
    reproduced as a digital image. We can ask ourselves what cultural
    conditions result in gallery spaces where viewers can adjust the lighting
    or concert spaces where the sound is not uniform (for example Morton
    Feldman's _Rothko Chapel_) for every point in the space.
    
    There is a wholesale attitude towards temporality and the possibility of
    intersubjective experience that accompanies people's use of media and
    their discourse about the use of media. Some of us begin from a
    non-Parmenidean position: change is the very basis upon which we can build
    shared experiences. Media can help in two ways: as facilitators of change
    and preservation; as facilitators of sharing (and hoarding). I'm not
    quite sure if a necessary (as opposed to fortuitous) connection exists
    between the two types of facilitation. Any thoughts?
    
    -- 
    Francois Lachance, Scholar-at-large
    	http://www.chass.utoronto.ca/~lachance
    Member of the Evelyn Letters Project
    	http://www.chass.utoronto.ca/~dchamber/evelyn/evtoc.htm
    



    This archive was generated by hypermail 2b30 : 11/08/00 EST