17.151 new issue of LLC

From: Humanist Discussion Group (by way of Willard McCarty willard.mccarty@kcl.ac.uk)
Date: Tue Jul 15 2003 - 01:46:22 EDT

  • Next message: Humanist Discussion Group (by way of Willard McCarty

                   Humanist Discussion Group, Vol. 17, No. 151.
           Centre for Computing in the Humanities, King's College London
                         Submit to: humanist@princeton.edu

             Date: Tue, 15 Jul 2003 06:35:59 +0100
             From: Edward Vanhoutte <evanhoutte@kantl.be>
             Subject: Literary and Linguistic Computing - TOC 18/1

    Literary and Linguistic Computing -- Table of Contents Alert

    A new issue of Literary and Linguistic Computing has been made

    April 2003; Vol. 18, No. 1

    URL: http://www3.oup.co.uk/litlin/hdb/Volume_18/Issue_01/

    Marilyn Deegan, p. 1

    - Introduction: New Directions in Humanities Computing
    David Robey, pp. 3-9

    - Towards the User: The Digital Edition of the Deutsche Wrterbuch by
    Jacob and Wilhelm Grimm
    Ruth Christmann and Thomas Schares, pp. 11-22
    Since February 2002, a first version of the Deutsche Wrterbuch (DWB) by
    Jacob and Wilhelm Grimm has been available on the web. A CD-ROM beta
    version has been available since December 2002. This paper will focus on
    the steps involved in drawing up an electronic version of the DWB and,
    by demonstrating the design of the Graphical User Interface (GUI), will
    show how common standards of digitization were taken into account and
    user needs were anticipated during the production process. The history
    and structure of the DWB will be outlined first to point out some
    characteristics of the dictionary. The process of retrodigitization from
    printed page to electronic dictionary will be briefly described and,
    while giving an overview of the DWB GUI, the importance of content-based
    markup and a user-friendly but powerful GUI as a necessary precondition
    for sensible and effective access to the dictionary contents will be
    stressed. The title of this paper, Towards the User, can thus be
    interpreted in two ways: during the digitization of the DWB, we consider
    the needs of the users, and by digitization, we hope to open up this
    huge amount of data and lexicological information for researchers.

    - The Scottish Corpus of Texts and Speech: Problems of Corpus Design
    Fiona M. Douglas, pp. 23-37
    In recent years, the use of large corpora has revolutionized the way we
    study language. There are now numerous well-established corpus projects,
    which have set the standard for future corpus-based research. As more
    and more corpora are developed and technology continues to offer greater
    and greater scope, the emphasis has shifted from corpus size to
    establishing norms of good practice. There is also an increasingly
    critical appreciation of the crucial role played by corpus design.
    Corpus design can, however, present peculiar problems for particular
    types of source material. The Scottish Corpus of Texts and Speech
    (SCOTS) is the first large-scale corpus project specifically dedicated
    to the languages of Scotland, and therefore it faces many unanswered
    questions, which will have a direct impact on the corpus design. The
    first phase of the project will focus on the language varieties Scots
    and Scottish English, varieties that are themselves notoriously
    difficult to define. This paper outlines the complexities of the
    Scottish linguistic situation, before going on to examine the
    problematic issue of how to construct a well-balanced and representative
    corpus in what is largely uncharted territory. It argues that a
    well-formed corpus cannot be constructed in a linguistic vacuum, and
    that familiarity with the overall language population is essential
    before effective corpus sampling techniques, methodologies, and
    categorization schema can be devised. It also offers some preliminary
    methodologies that will be adopted by SCOTS.

    - A Logic Programming Environment for Document Semantics and Inference
    David Dubin, Allen Renear, C. M. Sperberg-McQueen and Claus Huitfeldt,
    pp. 39-47
    Markup licenses inferences about a text. But the information warranting
    such inferences may not be entirely explicit in the syntax of the markup
    language used to encode the text. This paper describes a Prolog
    environment for exploring alternative approaches to representing facts
    and rules of inference about structured documents. It builds on earlier
    work proposing an account of how markup licenses inferences, and of what
    is needed in a specification of the meaning of a markup language. Our
    system permits an analyst to specify facts and rules of inference about
    domain entities and properties as well as facts about the markup syntax,
    and to construct and test alternative approaches to translation between
    representation layers. The system provides a level of abstraction at
    which the performative or interpretive meaning of the markup can be
    explicitly represented in machine-readable and executable form.

    - Forensic Linguistics: its Contribution to Humanities Computing
    Laszlo Hunyadi, Kalman Abari and Enik T[odblac]th, pp. 49-62
    The paper is a report on a case in forensic linguistics in which
    linguistic and computational approaches are combined to answer the
    question whether it can be proved if a digital recording has been
    tampered with. With the growing use of digital applications, the chances
    of digital forgery are increasing significantly. Accordingly, the
    detection of tampering with audio recordings is also becoming an
    important task for forensic linguists. In the given case, we assumed
    that the most straightforward way of tampering with the given digital
    audio recording might have been the removal of some material and so our
    aim was to identify the location of this kind of tampering in the file.
    Due to the complexity of the given task the approach presented is
    interdisciplinary: first, it uses a traditional semantic analysis to
    identify possible discontinuous segments of the recorded text; secondly,
    it introduces an experimental phonetic approach to identify cues of the
    digital cutting of the audio signal; thirdly, it applies statistical
    calculations to specify the bit-level characteristics of audio
    recordings. The combination of these measurements proved to be quite
    helpful in answering the initial question, and the proposed new
    methodologies can be used in further areas of linguistics and

    - The Publication of Archaeological Excavation Reports Using XML
    Christiane Meckseper and Claire Warwick, pp. 63-75
    This paper looks at the usability of XML for the electronic publication
    of field reports by commercial archaeological units. The field reports
    fall into the field of grey literature as they are produced as client
    reports by commercial units as part of the planning process and do not
    receive official publication or widespread dissemination. The paper uses
    a small commercial unit called ARCUS at the University of Sheffield as a
    case study and to mark up a sample of excavation report using XML and
    the TEI Lite DTD. It also looks at the possibility of incorporating
    controlled archaeological vocabulary into the DTD. The paper comes to
    the conclusion that the electronic publication of grey reports would be
    very useful as it would allow a quicker response time and a rapid
    dissemination of information within the fast-moving and changing
    environment of commercial archaeology. XML would be a useful tool for
    the publication of field reports as it would allow practitioners to
    selectively download separate sections of field reports that are of
    particular importance to them and to improve the searchability of
    reports on the web. It is recognized that national archaeological
    institutions will also have to accept electronic versions of field
    reports in order for them to be able to be built into the financial
    framework of a commercial project design.

    - METAe-Automated Encoding of Digitized Texts
    Birgit Stehno, Alexander Egger and Gregor Retti, pp. 77-88
    This paper explains why and how the digitization project METAe applies
    METS (Metadata Encoding and Transmission Standard) as encoding scheme
    for automatically extracted metadata. In contrast to TEI (Text Encoding
    Initiative) and other markup languages, METS allows encoding of the
    whole range of structural, descriptive, and administrative metadata in a
    systematic way. As the METS schema permits the integration of other
    existing standards, it provides a highly flexible output that can be
    converted easily to the individual needs of digital libraries. An
    innovative aspect of the METAe data structure is the ALTO file
    ('Analysed layout and text object'), which contains the layout
    structures as well as the text passages of book pages. Structural maps
    of the METS schema are used to compose the logical and the physical
    structures out of ALTO and image files.

    - Testing Structural Properties in Textual Data: Beyond Document
    Felix Sasaki and Jens Pnninghaus, pp. 89-100
    Schema languages concentrate on grammatical constraints on document
    structures, i.e. hierarchical relations between elements in a tree-like
    structure. In this paper, we complement this concept with a methodology
    for defining and applying structural constraints from the perspective of
    a single element. These constraints can be used in addition to the
    existing constraints of a document grammar. There is no need to change
    the document grammar. Using a hierarchy of descriptions of such
    constraints allows for a classification of elements. These are important
    features for tasks such as visualizing, modelling, querying, and
    checking consistency in textual data. A document containing descriptions
    of such constraints we call a 'context specification document' (CSD). We
    describe the basic ideas of a CSD, its formal properties, the path
    language we are currently using, and related approaches. Then we show
    how to create and use a CSD. We give two example applications for a CSD.
    Modelling co-referential relations between textual units with a CSD can
    help to maintain consistency in textual data and to explore the
    linguistic properties of co-reference. In the area of textual,
    non-hierarchical annotation, several annotations can be held in one
    document and interrelated by the CSD. In the future we want to explore
    the relation and interaction between the underlying path language of the
    CSD and document grammars.

    - The Versioning Machine
    Susan Schreibman, Amit Kumar and Jarom McDonald, pp. 101-107
    This article describes the background and architecture of The Versioning
    Machine, a software tool designed to display and compare multiple
    versions of texts. The display environment provides for features
    traditionally found in codex-based critical editions, such as annotation
    and introductory material. It also takes advantage of opportunities
    afforded by electronic publishing, such as providing a frame to compare
    diplomatic versions of witnesses side by side, allowing for
    manipulatable images of the witness to be viewed alongside the
    diplomatic edition, and providing users with an enhanced typology of

    - Minutes of the Annual General Meeting of the Association for Literary
    and Linguistic
    Computing held at Tbingen, Germany on 27 July 2002 pp. 109-111

    - Treasurer's Report: Financial year January to December 2002
    Jean Anderson, pp. 112-114


    ============= Edward Vanhoutte Co-ordinator Centrum voor Teksteditie en Bronnenstudie - CTB (KANTL) Centre for Scholarly Editing and Document Studies Reviews Editor, Literary and Linguistic Computing Koninklijke Academie voor Nederlandse Taal- en Letterkunde Royal Academy of Dutch Language and Literature Koningstraat 18 / b-9000 Gent / Belgium tel: +32 9 265 93 51 / fax: +32 9 265 93 49 evanhoutte@kantl.be http://www.kantl.be/ctb/ http://www.kantl.be/ctb/vanhoutte/

    This archive was generated by hypermail 2b30 : Tue Jul 15 2003 - 01:50:13 EDT