--- tomd: From sales literature send via hardcopy mail, I learned that Prof Kimmo Koskenniemi is one of the "founders and principal owners of Lingsoft." He developed a "two-level model" of morphological analysis that seems to be popular as the basis of software for morphological analysis. ------------------------------------------------------------- >From Richard Sproat of AT&T's Linguistics Research Department ------------------------------------------------------------- Internet address: rws@research.att.com Probably the best and most general available commercial software for doing this kind of thing is PC-KIMMO, which you can actually get for free by anonymous FTP. I enclose some info (dated January 92 -- I assume it still holds) on that below. There is also a book to go with that by Evan Antworth, which you can get from the Summer Institute of Linguistics (address below). For more general discussion of various methods for doing computational morphology, you can also consult two recent MIT Press Books: 1. Computational Morphology: Practical mechanisms for the English lexicon. By Graeme D. Ritchie, Graham J. Russell, Alan W. Black and Stephen G. Pulman. ACL-MIT Press Series in Natural Language Processing. Cambridge, Massachusetts: MIT Press, 1992 2. And my own 1992 book in the same series, Morphology and Computation. Mine covers a wider variety of stuff than does the Ritchie et al. book. Richard Sproat Linguistics Research Department AT&T Bell Laboratories | tel (908) 582-5296 600 Mountain Avenue, Room 2d-451 | fax (908) 582-7308 Murray Hill, NJ 07974, USA | rws@research.att.com--- TomD: Richard also enclosed a lengthy "news" item on PC-KIMMO from Evan Antworth. It seemed a bit too long to include here, but see the next item *from* Evan Antworth. --------------------------------------------------------------------- >From Evan Antworth of Academic Computing Department, (institution???) --------------------------------------------------------------------- Internet address: evan.antworth@sil.org Here is some information on PC-KIMMO, a program for morphological parsing. It has been reviewed in _Computational Linguistics_ 17:2, June 1991 and also in _Computers and the Humanities_ 26:2, April 1992. We provide the C source code with the intention that it be used in programs developed by the user. Of course, I cannot say whether or not it could successfully be used in your application. Let me know if I can help you further. Evan Antworth evan.antworth@sil.org ------------------------------------------ PC-KIMMO: A Two-level Processor for Morphological Analysis WHAT IS PC-KIMMO? PC-KIMMO is a new implementation for microcomputers of a program dubbed KIMMO after its inventor Kimmo Koskenniemi (see Koskenniemi 1983). It is of interest to computational linguists, descriptive linguists, and those developing natural language processing systems. The program is designed to generate (produce) and/or recognize (parse) words using a two-level model of word structure in which a word is represented as a correspondence between its lexical level form and its surface level form. Work on PC-KIMMO began in 1985, following the specifications of the LISP implementation of Koskenniemi's model described in Karttunen 1983. The coding has been done in Microsoft C by David Smith and Stephen McConnel under the direction of Gary Simons and under the auspices of the Summer Institute of Linguistics. The aim was to develop a version of the two-level processor that would run on an IBM PC compatible computer and that would include an environment for testing and debugging a linguistic description. The PC-KIMMO program is actually a shell program that serves as an interactive user interface to the primitive PC-KIMMO functions. These functions are available as a C-language source code library that can be included in a program written by the user. [tomd: much text deleted] HOW TO CONTACT US PC-KIMMO is a research project in progress, not a finished commercial product. In this spirit, we invite your response to the software and the book. Please direct your comments to: Academic Computing Department PC-KIMMO project 7500 W. Camp Wisdom Road Dallas, TX 75236 U.S.A. phone: 214/709-3346, -2418 email: evan.antworth@sil.org (Evan Antworth) REFERENCES Antworth, Evan L. 1990. PC-KIMMO: a two-level processor for morphological analysis. Occasional Publications in Academic Computing No. 16. Dallas, TX: Summer Institute of Linguistics. ISBN 0-88312-639-7, 273 pages, paperbound. Karttunen, Lauri. 1983. KIMMO: a general morphological processor. Texas Linguistic Forum 22:163-186. Koskenniemi, Kimmo. 1983. Two-level morphology: a general computational model for word-form recognition and production. Publication No. 11. University of Helsinki: Department of General Linguistics. ---------------------- >From Ian Hersey of IBM ---------------------- Internet Address: hersey@vnet.IBM.COM We do have a system that both lemmatizes ("stems") and generates all inflected forms, and it is available for about 19 European languages. We also do lemmatization for Japanese. The code is language-independent: you just plug in the dictionary you need and go from there. This same service also performs hyphenation (not for Japanese -- it isn't ever hyphenated) and spell-checking. This system is available for Windows, OS/2, AIX, VM and MVS. I should mention that our morphological processing only handles inflectional morphology: "compute" can generate "computes", "computed" and "computing" (all forms of the verb "to compute"), but it will not generate "computer". The "-er" and other affixes that change the part of speech are known as derivational morphology, and our service doesn't handle that area (yet). I'm not the one to give pricing information. Please contact Brian Gessel at 301-803-2943 for that; he's our business person. He can also provide you with an OEM fact sheet that lists all of the languages and sizes. Regards, Ian -------------------------------------------------------- >From Daniel Stieger of Institut fuer Informationssysteme -------------------------------------------------------- Internet Address: stieger@inf.ethz.ch [tomd: Dani Stieger is responding to a query regarding a German language stemmer based on the "Porter algorithm."] As I mentioned to your colleague there is no serious report about our experiments. I am in possession of a "Semester Work" (a short report performed by a student) about this subject. It is NOT available in machine readable form [tomd: text deleted] AND ... it is written in GERMAN. The Report contains also a listing of the german Porter algorithm (written in MODULA-2 !!). Furthermore, you need the decomposition of german words so that you are really stemming the right (ending) part of the word (as you know, german words may be composed of several words). For the decomposition I used an automatically generated dictionnary (215'000 german words). [tomd: text deleted] >You mention "Porter (1983)." Can you send me the full citation? Is >there some way we can get the source of your experiments with the >algorithm? > M.F. Porter: An Algorithm for Suffix Stripping. Program, Vol. 14, No. 3, 1980, pp. 130-137. [tomd: text deleted] Dani ************************************************************************ Daniel Stieger stieger@inf.ethz.ch Institut fuer Informationssysteme ETH Zentrum, IFW E43.2 Tel: +41-1-254-7226 CH - 8092 Zuerich Fax: +41-1-262-3973 ************************************************************************ -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Thanks again for all your help, Tom # Tom Donaldson 2400 Research Blvd., Suite 350 # # Senior Software Developer Rockville, MD 20850 # # Personal Library Software (301) 990-1155, FAX: (301) 963-9738 # # e-mail: tomd@pls.com #