Humanist Discussion Group, Vol. 13, No. 522.
Centre for Computing in the Humanities, King's College London
<http://www.princeton.edu/~mccarty/humanist/>
<http://www.kcl.ac.uk/humanities/cch/humanist/>
[1] From: Alexander Nakhimovsky <sasha@cs.colgate.edu> (41)
Subject: Text Analysis Tools for XML documents: a Web
application
[2] From: John Dawson <jld1@cam.ac.uk> (17)
Subject: Re: Text Analysis Tools for XML documents: a Web
application
--[1]------------------------------------------------------------------
Date: Wed, 29 Mar 2000 19:10:45 -0600
From: Alexander Nakhimovsky <sasha@cs.colgate.edu>
Subject: Text Analysis Tools for XML documents: a Web application
An announcement: Text Analysis Tools for XML documents: a Web application
The release, last November, of the XSL and XPath Recommendations created a
new range of possibilities for text-analysis tools. Since January, a
project at Colgate University in the US has been developing a set of tools
with the following design goals:
-- the tools are available over the network as a Web application;
-- the tools are DTD independent: the user interface is constructed
automatically on the basis of the document's DTD;
-- the queries that the tools can process use XPath to express structural
query conditions and Regular Expressions to describe the text patterns of
the query;
-- the tools are extensible: if XSLT cannot do a query, it can be relegated
to an extension function written in a general-purpose programming language
(Java most easily);
-- secondary documents, such as concordances, frequency counts, inverted
indices and so on, are kept as XML documents, optimized for query
processing but also available for printing and display.
We now have an early version of the tools and a tutorial on how to use
them, both to be found at
http://csproj.colgate.edu/TextTools.htm
Our main purpose in posting this announcement is to get feedback: what
other functionality is needed? how can the user interface be improved? We
are interested in collaborating with an ongoing project to try out ideas.
There are email addresses at the end of this message. Eventually, we would
like to make this an open source project.
The tutorial uses a very simple DTD (Jon Bozak's play.dtd), and a single
text, The Merchant of Venice. However, the program is DTD-independent.
The next version of the tutorial will use TEI Light and provide
instructions on how to use the program with a DTD of your own.
Both the program and the tutorial have been prepared by Karthik Jayaraman,
following initial suggestions by Alexander Nakhimovsky. Karthik
(kjayaraman@mail.colgate.edu) is a senior undergraduate student, and
Nakhimovsky (sasha@cs.colgate.edu) is a faculty member in the computer
science department at Colgate. We will be giving a paper on our work at
XML-Europe in Paris in June. A poster and a software demo will be
presented at the ALLC/ACH meeting in Glasgow.
Alexander Nakhimovsky tel 315-228-7586
Computer Science Dpt fax 315-228-7004
Colgate University sasha@cs.colgate.edu or
Hamilton NY 13346 sasha@mail.colgate.edu
--[2]------------------------------------------------------------------
Date: Thu, 30 Mar 2000 11:03:11 +0100 (BST)
From: John Dawson <jld1@cam.ac.uk>
Subject: Re: Text Analysis Tools for XML documents: a Web application
At first sight, very impressive, and very useful.
When searching a speech for a particular word, neither the
immediate results, nor the expanded sources, show which part
of the play they come from.
A couple of comments:
(1) If I search SPEECH for 'trip' I get one match, a speech by JESSICA.
Clicking on the ellipsis shows the complete scene, but doesn't say which
Act it's in.
(2) It would be a good idea to highlight the words searched for in the
results (with colour, preferably), as if a complete speech is chosen as
the context, this can be quite long, and difficult to spot the chosen
word.
Thanks. John
John Dawson work: JLD1@cam.ac.uk home: JLDawson@talk21.com
(01223) 335029 (01462) 893410
web: http://www.cus.cam.ac.uk/~jld1
-------------------------------------------------------------------------
Humanist Discussion Group
Information at <http://www.kcl.ac.uk/humanities/cch/humanist/>
<http://www.princeton.edu/~mccarty/humanist/>
=========================================================================
This archive was generated by hypermail 2b29 : Sat Apr 01 2000 - 20:17:43 CUT