16.383 call for papers: multilingual corpora

From: Humanist Discussion Group (by way of Willard McCarty willard.mccarty@kcl.ac.uk)
Date: Fri Dec 13 2002 - 03:16:28 EST

  • Next message: Humanist Discussion Group (by way of Willard McCarty

                   Humanist Discussion Group, Vol. 16, No. 383.
           Centre for Computing in the Humanities, King's College London
                         Submit to: humanist@princeton.edu

             Date: Fri, 13 Dec 2002 07:56:05 +0000
             From: Silvia Hansen <hansen@coli.uni-sb.de>
             Subject: First CfP: Pre-Conference Workshop on Multilingual Corpora

           Apologies to those of you who receive this more than once

                              ** CALL FOR PAPERS **

                              Multilingual Corpora:
                Linguistic Requirements and Technical Perspectives

         A pre-conference workshop to be held at
                            Corpus Linguistics 2003

            Lancaster, 27 March 2003



    Stella Neumann (Department of Applied Linguistics, Translation and
    Silvia Hansen (Department of Computational Linguistics)

    Saarland University, Saarbrcken, Germany


    How do researchers go about building multilingual corpora? For the
    development of a linguistically interpreted corpus on the basis of more
    than one language there seem to be two methods: First, the multilingual
    corpus is split up into monolingual sub-corpora which are then annotated
    independently. For the second method, one language serves as the basis
    for building up and interpreting a multilingual corpus, whereas the
    other has to be adapted. Both methods, however, are rather problematic.
    They do not take sufficiently into account the differences and
    commonalities between the languages in question at each stage of
    corpus-based research, involving the comparability of the corpus design,
    the different kinds of segmentation, the diverging annotation schemes,
    the corpus representations and finally the again converging querying
    across different languages. Mistakes or inconsistencies which happen at
    one stage of the multilingual corpus development have negative
    influences on the following steps and result in worse mistakes or
    inconsistencies. Not only do these problems arise at each methodological
    step. They also multiply with the growing complexity of the research
    design. If the research aims at interpreting linguistic data on several
    levels, cross-linguistic comparability has to be taken into account on
    each level.

    The goal of the workshop is to bring together researchers who formulate
    specific requirements of how to work with corpora under a linguistic
    perspective and engineers who can offer technical solutions but need the
    input of users to adapt their tools to the needs of the linguists.
    Within this context, questions like the following are to be discussed:
    - What happens, if the units under investigation diverge on the
    different levels?
    - At present, the preferred solution is to use XML at all stages and on
    all layers. But is this really practicable?
    - Do linguists get along with stand-off mark-up?
    - Is this maybe a technical compromise?

    The workshop should result in a requirement catalogue in combination
    with technical solutions. It could thus serve as a starting point for
    the development of an annotation typology which takes into account
    different languages as well as different annotation layers. On the basis
    of this typology, the comparability of a multilingual multi-layer
    annotated corpus can be guaranteed. With this in mind, a multilingual
    corpus builder should be able to cope with possible problems in each of
    the above explained steps in corpus development.

    Papers are expected on the following questions:
    - linguistic requirements in the different methodological steps
    - state-of-the-art technical solutions
    - international standards which facilitate the development and exchange
    of multilingual corpora

    The workshop will take a full day comprising about 8-10 papers. Short
    presentations are expected leaving enough time for discussion and
    assessment of the used methodologies as well as the development of
    possible solutions. This already points to the workshop agenda: The
    first third will deal with linguistic fundamentals, the second part will
    discuss the technical aspects and the last third will provide a platform
    for integrating both perspectives. Workshop proceedings will be


    to be announced!


    20 January 2003: Deadline for submitted papers
    21 February 2003: Notification of acceptance
    7 March 2003: Camera ready copy
    27 March 2003: Workshop


    Please refer to the main conference web page
    (http://www.comp.lancs.ac.uk/ucrel/cl2003) for registration details.


    Please send submissions in English as RTF or plain text files
    by email) to the address below. Paper length should be 8-10 pages,
    in the same way as for the main conference
    (see http://www.comp.lancs.ac.uk/ucrel/cl2003/style.html
    for paper format guidelines).

    Stella Neumann (st.neumann@mx.uni-saarland.de)
    Department of Applied Linguistics, Translation and Interpreting (FR 4.6)

    Saarland University
    Postfach 15 11 50
    66041 Saarbrcken

    This archive was generated by hypermail 2b30 : Fri Dec 13 2002 - 03:23:37 EST