19.473 relational database and TEI

From: Humanist Discussion Group (by way of Willard McCarty willard.mccarty_at_kcl.ac.uk>
Date: Sat, 3 Dec 2005 10:15:08 +0000

               Humanist Discussion Group, Vol. 19, No. 473.
       Centre for Computing in the Humanities, King's College London
                   www.kcl.ac.uk/humanities/cch/humanist/
                        www.princeton.edu/humanist/
                     Submit to: humanist_at_princeton.edu

   [1] From: Manfred Thaller <manfred.thaller_at_uni-koeln.de> (59)
         Subject: Re: 19.470 relational database and TEI

   [2] From: Norman Hinton <hinton_at_springnet1.com> (15)
         Subject: Re: 19.470 relational database and TEI

   [3] From: Wendell Piez <wapiez_at_mulberrytech.com> (74)
         Subject: Re: 19.470 relational database and TEI

   [4] From: "Da Rold, Dr. O." <odr1_at_leicester.ac.uk> (51)
         Subject: thought

   [5] From: "Simone Albonico" <albonico_at_unipv.it> (89)
         Subject: R: 19.433 relational database and TEI?

--[1]------------------------------------------------------------------
         Date: Sat, 03 Dec 2005 09:54:02 +0000
         From: Manfred Thaller <manfred.thaller_at_uni-koeln.de>
         Subject: Re: 19.470 relational database and TEI

Dear Willard,
>We need to get out of the Shop of
>Solutions, all pre-packaged and shrinkwrapped, so that we can begin
>to imagine what no one yet knows.
yes, I could not agree more.

Two short comments.

First "Databases":

If one looks a few years back, one of the fundamental texts on data
modelling (Dionysios C. Tsichritzis and Frederick H. Lochovsky: Data
Models, Englewood Cliffs, 1982, 50ff.) was built upon the assumption,
that <emph>any</emph> data model could be expressed as either a set
of pointer-related tables or as a graph. Well - XML is clearly a
case, where a graphis a moe useful representation, than a set of tables.

I'd like to emphasize, that I just mentioned <emph>data models</emph>
not "databases". Emphasizing not because of terminological
overprecision, but because I have the feeling, that when many people
speak about the relational data model, what they actually mean is "a
system which works fast and has nice input facilities". Whether a
database works fast, however, has actually no relationship to the
data model it is build upon, but to the indexing components it uses.
And I have in the meantime the suspicion, that in about ninetyfive
percent of the Humanities / Cultural Heritage projects which use
Oracle as a database to access data that has been borne XMLish
"because this professional relational system provides extremely fast
access", what is actually meant is "because it has a fast indexing
system with a robust interface". That forcing XML-structured data,
where closely related junks of text are usually residing close by on
the hardware, into a relational model, which is notorious for
spreading everything around without any concern for the semantics of
the relationships, actually is one of the slowest ways of processing
these data, is than obscured by the fact that the indexing machine,
indeed, works quite fast.

To break out of the deadlock, one would have to shift focus a bit
away from the question of "how do we markup?" to "what are the
adequate underlying, abstract, data structures which we express by
markup". (Data structures in the sense of implementable computer
science abstractions, which are related to, but <emph>not</emph>
identical, to intellectual philological / Humanities abstractions.

Some readers my find the ideas on the topic interesting, I expressed
in a paper available at
http://www.hki.uni-koeln.de/people/thaller/currentArguments/

Second "Shrinkwrapping":

Shamelessly quoting myself, I have raised the point at a number or
recent conferences, like this:

<quote-of-conference-transparency-never-written-up-as-paper>

Humanities Computer Science strengthens the position of the Humanities.

Today the Humanities are frequently "consumers of information
technology". I.e.: The use concepts, tools and technologies which
have been developed by others.

To strengthen their position within research politics it is important
to make clear, that they can contribute genuinely to the further
development of computer science in general, specifically with
uncertain or fuzzy information, knowledge representation and decision
processes and with complex textual models.

</quote-of-conference-transparency-never-written-up-as-paper>

The full argument can be found at
http://www.hki.uni-koeln.de/events/amsterdam300904/index.html (item
"3. View", click on "HCS").

Best, Manfred

--[2]------------------------------------------------------------------
         Date: Sat, 03 Dec 2005 09:54:32 +0000
         From: Norman Hinton <hinton_at_springnet1.com>
         Subject: Re: 19.470 relational database and TEI

I asked, in another List devoted to uses of computers for English,
what had happened to the free-for-all, do-it-yourself spirit that I
found in "humanities computing" in the 70s and 80s, and the response
was that now there are already existing programs that we can use, so
that the old 'entrepreneurial' approach to humanities computing was
over. And that people could now get going on 'real work'.

This would seem to suggest that the attitudes Willard opposes here
(and I do too) have taken over.

As Willard would say -- comments ?

>As long as we think of ourselves merely as "end-users"
>or "end-appliers" of technologies invented elsewhere, we are no
>better than adherents to particular trendy schools of literary
>criticism, philosophy or whatever. We need to get out of the Shop of
>Solutions, all pre-packaged and shrinkwrapped, so that we can begin
>to imagine what no one yet knows.

--[3]------------------------------------------------------------------
         Date: Sat, 03 Dec 2005 09:55:05 +0000
         From: Wendell Piez <wapiez_at_mulberrytech.com>
         Subject: Re: 19.470 relational database and TEI

Dear Willard,

In reply both to your latest question and to Joris's very nicely
tempered response suggesting researchers investigate Ruby and Ruby-on-rails:

At 01:30 AM 12/2/2005, you wrote:
>As long as the researcher's task is put for purposes of research in
>terms of commitment to this or that existing technology, the story
>goes not even as happily as Homer's. Of course in a grant-funded
>project or other timetabled affair, choices must be made, things
>actually done by date X, this or that achieved. Rules of the game,
>the cost of having the grant etc. But (let me stubbornly emphasize
>again) *in terms of research* the humanities computing problem here
>would seem to be clarifying the emergent technology, not selecting an
>emerged one.

It is exactly in this spirit that I think we need to encourage such
initiatives as Joris's (and not just polemically, but through action
and the application of resources, where we have them). Ruby is
indeed, to all accounts, a "sweet deal" (as one might say
colloquially and not ironically :-), and when adherents of a
technology are singing of sweetness rather than apologetically
defending perceived ungainliness (as I was doing with XSLT), that
should always be taken as a good sign. Not that someone won't do
something ungainly with it eventually. But a good sign nonetheless.

Yet I think you are exactly right to wonder whether this open-ended
researcher's attitude is quite, erm, "affordable" by every HC
project. Not only do we not want to exclude from our work those who
contribute other talents and skills besides those of a Joris (how
would Orietta's team get on with the suggestion that they go
object-oriented?), but also (and thankfully so) we find that
"clarifying the emergent technology" is itself not the only and
inevitable goal. The problems Orietta faces will be there
irrespective of whether she selects one technology or the other, and
it is her project's contribution to the understanding of those
problems (which are not simply problems of which of several emerged
or emerging "solutions" she might select) that is the main point.
(With you, I put "solutions" in quotes since such a solution, we also
know, is really just another problem.)

You yourself have often remarked on the fine edge between the tool
and its use. Which tool we pick up must always remain a practical
problem, even when we know that, given the right skills and machines
to back us up, our selection might be different. Given how you
yourself have made significant contributions to deep theoretical work
in our field using tools hardly more capable than a spreadsheet, I
count you among the masters who understand this. Indeed, sometimes it
is the technology we had earlier considered fully "emerged" that is
suddenly clarified by a new application.

> As long as we think of ourselves merely as "end-users"
>or "end-appliers" of technologies invented elsewhere, we are no
>better than adherents to particular trendy schools of literary
>criticism, philosophy or whatever. We need to get out of the Shop of
>Solutions, all pre-packaged and shrinkwrapped, so that we can begin
>to imagine what no one yet knows.

Well yes, though I confess (still feeling a bit of the academic
exile's envy for life inside the cloisters) I'm surprised to here
that "adherents of particular trendy schools" are to be so casually
disparaged by us. (Not that I don't have differences with one or
another trendy school. But surely it's the adherence to trendiness
you react against, not the schools themselves.) Criticism of
substance as mere fashion goes both ways; I doubt there's a serious
subscriber to HUMANIST who hasn't been accused, at some point, of
mindlessly following a trend, since that's obviously what all this
computer stuff really is. Yet sometimes the difference between the
glitzy mall storefront and the tinker's workshop is harder to make
out than we pretend.

Cheers,
Wendell

======================================================================
Wendell Piez mailto:wapiez_at_mulberrytech.com
Mulberry Technologies, Inc. http://www.mulberrytech.com
17 West Jefferson Street Direct Phone: 301/315-9635
Suite 207 Phone: 301/315-9631
Rockville, MD 20850 Fax: 301/315-8285
----------------------------------------------------------------------
    Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================

--[4]------------------------------------------------------------------
         Date: Sat, 03 Dec 2005 10:00:14 +0000
         From: "Da Rold, Dr. O." <odr1_at_leicester.ac.uk>
         Subject: thought

Willard,

I can certainly agree with your point below. In particular the last
paragraph expresses my frustration of the past months.

I have an anecdote. When I raised this issue, I was told:'you want to
run before you can walk'. To that I would have wanted to answer: 'but we
walk, so that we can run'. Without this impetus for running, we probably
would never have walked. My answer would not have been understood, for
them providing ready made solutions was the easiest and most practical
thing to do.

Moreover, how can this be explained to funding bodies? and alike? The
end product is paramount and we, 'researchers, have to compromise to it,
although deep down there we know that it is not quite right as our job
is to run and not to walk! Perhaps one day funding bodies will give
money to projects just because they want to run.

Great discussion nevertheless.

All best wishes,

Orietta

--[1]------------------------------------------------------------------
           Date: Fri, 02 Dec 2005 06:20:16 +0000
           From: Willard McCarty <willard.mccarty_at_kcl.ac.uk>

A brief polemic, a pot-stirring.

It seems to me that the situation Orietta Da Rold has described is not
entirely dissimilar to the Homeric story of Scylla and Charybdis
-- the Scylla of database design vs the Charybdis of TEI -- or the other
way around, as you please. What I'd think the *researcher* should be
doing is getting as clear as possible the nature of the tool that would
best answer to the problem at hand -- in this case as well as mine, it
seems, a tool that does not yet exist. How else, I wonder, can we
progress -- i.e. imagine new tools we don't already know how to build?

As long as the researcher's task is put for purposes of research in
terms of commitment to this or that existing technology, the story goes
not even as happily as Homer's. Of course in a grant-funded project or
other timetabled affair, choices must be made, things actually done by
date X, this or that achieved. Rules of the game, the cost of having the
grant etc. But (let me stubbornly emphasize
again) *in terms of research* the humanities computing problem here
would seem to be clarifying the emergent technology, not selecting an
emerged one. As long as we think of ourselves merely as "end-users"
or "end-appliers" of technologies invented elsewhere, we are no better
than adherents to particular trendy schools of literary criticism,
philosophy or whatever. We need to get out of the Shop of Solutions, all
pre-packaged and shrinkwrapped, so that we can begin to imagine what no
one yet knows.

Comments?

Yours,
WM

Dr Willard McCarty | Reader in Humanities Computing | Centre for
Computing in the Humanities | King's College London | Kay House, 7
Arundel Street | London WC2R 3DX | U.K. | +44 (0)20 7848-2784 fax:
-2980 || willard.mccarty_at_kcl.ac.uk www.kcl.ac.uk/humanities/cch/wlm/

--[5]------------------------------------------------------------------
         Date: Sat, 03 Dec 2005 10:01:15 +0000
         From: "Simone Albonico" <albonico_at_unipv.it>
         Subject: R: 19.433 relational database and TEI?

At the university of Pavia, between 2001 and 2004, we have worked at a
project to catalogue Renaissance miscellaneous manuscripts containing poems
(ALI-MAMIR project), in the context of several ongoing research concerning
Antologie della Lirica Italiana (<http://ali.unipv.it/>); the level of
description of the manuscripts required by the project is very detailed,
especially where the content of the manuscript is concerned. We first
started to catalogue more then 25 manuscripts with the help of an Access
database (using 4 cataloguers). Then we started thinking about the
integration of our data with the TEI/MASTER markup model. About this issue
and the description model which we created is available the following paper
(in Italian):

Simone Albonico, I manoscritti miscellanei di rime quattro-cinquecenteschi.
Il progetto di ricerca ALI-MAMIR (MAnoscritti MIscellanei Rinascimentali),
in La lirica del Cinquecento. Seminario di studi in memoria di Cesare
Bozzetti, Edizioni dell'Orso, Alessandria 2004, pp. 221-250

The paper contains a XML/TEI markup model of the description of a manuscript
and its content. Following the description model here developed, we started
a restructuring of the Access database that we initially used. The structure
of he second version of the databse was designed to allow, in a second
instance, the export of all the inserted data and their remodulation inside
a XML document compliant with the TEI-MASTER DTD (2003). Unfortunately, at
the moment we do not have a working version of this second version of the
database: the data structure is defined at a 95-99% (for a total of 45/50
tables); the import of data from the first to the second version of the
database has been done, but the data must be completed (the new structure
requires the integration of other data); at the end, the biggest issue was
the graphical interface. In order to comply with the level of detail that we
wanted to achieve (and with the huge variety of cases occurring in
manuscripts) we tried to develop a software which helps the compiler (also
in the view of extending the project including the contributions of other
cataloguers), but we had to suspend the project (we hope only temporarily).
This project of course implies the export of the data on the web, which we
would do in line with other projects already done in Pavia, that is
PostreSQL database and PHP interface (see the =93twin=94 project about 16th
century printed anthologies: <http://rasta.unipv.it/>).

If you are interested in more information or even just in an exchange of
thoughts about our experience (and the problems we met), you can contact
Maria Finazzi (finazzi_at_ada2.unipv.it or maria.finazzi_at_codexcoop.it).

Sorry about any possible mistake in the use of the English language; do feel
free to ask for clarification if something is unclear. For the sake of
clarity, please find in the following the Italian version of the e-mail.

Best regards
Simone Albonico

All'università di Pavia, nel periodo 2001-2004 abbiamo sviluppato un
progetto per la schedatura di manoscritti miscellanei di rime del
Rinascimento (progetto ALI-MAMIR), nell'ambito di diverse ricerche sulle
Antologie della Lirica Italiana (<http://ali.unipv.it/>); il livello di
descrizione dei manoscritti previsto dal progetto è molto dettagliato,
soprattutto per quanto riguarda il contenuto dei manoscritti. Utilizzando un
primo database Access abbiamo catalogato più di 25 manoscritti (impiegando 4
schedatori). Successivamente ci siamo posti il problema di una integrazione
dei nostri dati con lo schema di marcatura TEI/MASTER. Su questa
problematica e sul modello di descrizione a cui siamo arrivati alla fine
esiste un contributo uscito a stampa nel 2004 (in italiano):

Simone Albonico, I manoscritti miscellanei di rime quattro-cinquecenteschi.
Il progetto di ricerca ALI-MAMIR (MAnoscritti MIscellanei Rinascimentali),
in La lirica del Cinquecento. Seminario di studi in memoria di Cesare
Bozzettti, Edizioni dell'Orso, Alessandria 2004, pp. 221-250

L’articolo contiene anche un modello di marcatura XML/TEI della descrizione
di un manoscritto e del suo contenuto. Sulla scorta del modello di
descrizione ivi elaborato, abbiamo avviato una ristrutturazione del database
Access utilizzato inizialmente, la cui struttura è stata studiata in modo
tale da consentire, in un secondo tempo, l'esportazione dei dati inseriti e
una loro rimodulazione all'interno di un documento XML conforme alla
TEI-MASTER DTD (2003). Al momento purtroppo non disponiamo di una versione
funzionante del secondo modello del database: la struttura dei dati è
definita al 95-99% (per un totale di 45/50 tabelle); l'importazione dei dati
dal vecchio al nuovo database e stata effettuata ma i dati vanno ancora
completati (la nuova struttura richiede infatti un'integrazione); alla fine
il maggior ostacolo è risultato l'interfaccia grafica. Dato il tipo di
dettaglio che volevamo aggiungere (e la enorme varietà di situazioni che si
presentano di fronte a materiale manoscritto) abbiamo cercato di sviluppare
un sistema software che fornisse diversi aiuti al compilatore (anche per la
prospettiva di allargare il progetto al contributo di altri studiosi), ma
abbiamo dovuto (speriamo solo temporaneamente), lasciare il lavoro in
sospeso.

Il progetto prevede naturalmente un'esportazione dei dati su web, che
effettueremmo con modalità già adottate a Pavia per altri progetti: database
PostreSQL e interfaccia di visualizzazione in PHP (cfr. il progetto
"gemello" sulle raccolte a stampa cinquecentesche:
<http://rasta.unipv.it/>).

Se vi interessano ulteriori informazioni o anche solo uno scambio di vedute
sulla nostra esperienza (e le difficoltà incontrate) potete contattare Maria
Finazzi (finazzi_at_ada2.unipv.it o maria.finazzi_at_codexcoop.it).
Received on Sat Dec 03 2005 - 05:37:29 EST

This archive was generated by hypermail 2.2.0 : Sat Dec 03 2005 - 05:37:29 EST