17.005 POS tagging for Latin, consumptively viewed

From: Humanist Discussion Group (by way of Willard McCarty willard.mccarty@kcl.ac.uk)
Date: Thu May 08 2003 - 01:53:20 EDT

  • Next message: Humanist Discussion Group (by way of Willard McCarty

                    Humanist Discussion Group, Vol. 17, No. 5.
           Centre for Computing in the Humanities, King's College London
                       www.kcl.ac.uk/humanities/cch/humanist/
                         Submit to: humanist@princeton.edu

             Date: Thu, 08 May 2003 06:45:27 +0100
             From: Neven Jovanovic <neven.jovanovic@zg.tel.hr>
             Subject: Re: 16.610 POS tagging for Latin?

    Some time ago, a member of the list asked as follows:

    >I am looking for something equivalent to the CLAWS POS tagger that will work
    >with a Latin text. I poked around on the web but nothing leaped out.

    This is, of course, a problem of flective languages (with relatively free
    word order), connected with the problem of parsing (in Latin, Russian,
    Croatian, Greek...). As far as I know, there is some research into parsers
    for Latin (Italian LEMLAT project, Portuguese OLISSIPO project--both
    traceable on the WWW), but it seems yet to linger on purely academic level
    (restricted to certain word types, or to certain text groups--in any case,
    nothing readily available for us end-users).

    However, this seems related to the _consumptive humanities_ theme. If I
    want to parse, or to tag parts of speech in a Latin text, or texts--do I
    build a parser first, or do I do it the _old-fashioned_ way, relying on
    human linguistic intelligence? The first way has an obvious advantage--when
    I build the parser, with necessary adaptations, I sell it to any and all
    who need to parse / spellcheck any flective language, and get quite
    comfortably rich (so I can even devote the rest of my life to purely
    academic classical philology).

    Neven



    This archive was generated by hypermail 2b30 : Thu May 08 2003 - 01:52:52 EDT