4.0777 AI Bibliography Server (1/190)

Elaine Brennan & Allen Renear (EDITORS@BROWNVM.BITNET)
Wed, 28 Nov 90 23:53:19 EST

Humanist Discussion Group, Vol. 4, No. 0777. Wednesday, 28 Nov 1990.

Date: Fri, 16 Nov 90 18:36 EST
From: "NANCY M. IDE (914) 437 5988" <IDE@VASSAR>
Subject: AI bibliography server

from nl-kr digest:

From: "Alfred Kobsa" <ak@cs.uni-sb.de>
Date: Tue, 6 Nov 90 13:23:49 +0100
Subject: MAILSERVER FOR AI LITERATURE

THE LIDO MAILSERVER FOR AI LITERATURE Version 2.0

A mail server has been developed at the Computer Science Department of the
University of Saarbruecken which accesses a large database of bibliographic
data of articles pertaining to the field of Artificial Intelligence. At the
moment, this database contains more than 13.000 articles, which can be
retrieved via electronic mail. The result will be returned either in LaTeX
(Bibtex) format or in a Refer-like format.

This mail server is a "by-product" of the bibliographic information system
LIDO which is currently under development at the University of Saarbruecken.
The following people are involved in this project:

Coordination: Alfred Kobsa
Hacker: Monika Klar
Alfred Kobsa
Peter Schwarz
Wizards: Gerd Herzog
Clemens Huwig
Mail-Freak: Roman Jansen-Winkeln
Data Input: Christa Weinen
Gisela Veit

The LIDO MAILSERVER is partly based on the UNIX refer system. Queries to the
bibliographic database are restricted to the names of the author(s), the title,
and the year of publication. Users may select between full word search (fast,
since index-based; hence prioritized processing) and substring search with
optional regular expressions. Global search with key words is *not* possible.
Users who already have a certain overview of a field will thus probably profit
more from the LIDO MAILSERVER than novices familiarizing themselves with a new
area.

In order to keep the network and computer workload tolerable and to control
erroneous queries, certain security limits have been introduced:
1. Not more than 150 articles may be retrieved per query, and not more than
500 per message.
2. Queries with the option `nosubstring' are handled with priority.

Since LIDO is still under development, it cannot be distributed yet. However,
the bibliographic data (3 MB at the moment) may be obtained on a license basis
for a fee of U.S.$ 75.00-300.00 via ftp or on tape. Please understand that
it is not possible for us to lend out or to copy articles which you retrieve
in the bibliographic databases. If you find an error, please send a note to
bib-1@cs.uni-sb.de.

Messages to the LIDO MAILSERVER should be sent to

lido@cs.uni-sb.de

and should have the following format:

a) Subject field:

- First the key word `lidosearch'.
- Then the desired format of the bibliographic data in the return message:
`latex' (= Bibtex format) or `nolatex' (= refer-like format). The default
is `nolatex'.
- Then the form of retrieval:
a) `nosubstring': Your search patterns (see below) must be full words.
Your message will be handled with priority.
b) `substring' (default): Your search patterns may be substrings. Regular
expressions in the egrep notation (see Appendix) may be used as well.
Plural forms and spelling variants can thereby be accounted for.
- Then the language that should be used for comments and error messages in
the return message: `english' or `deutsch' (default).

b) Body of the Message:

Each line of the body of the message contains one or more search patterns
which may refer to the names of the authors, to words in the title, or to
the year of publication. If a line contains more than one search pattern,
only those articles are retrieved which match *all* patterns. German umlauts
and the `scharfes s' should be transliterated as follows: A", O", U", a",
o", u", s"

Example 1:
- --------
mail lido@cs.uni-sb.de
Subject: lidosearch latex nosubstring english

wahlster
generation
kobsa models 1989

This message contains three different queries. In the first case, all articles
are retrieved which contain the word `wahlster' as an author's name or as a
word in the title. In the second case, the same applies to `generation'. In the
third case, all articles are retrieved which contain both `kobsa' and `model'
and 1985 (but not `models', since `nosubstring' was selected). The message will
be handled with priority since `nosubstring' was chosen. The references in
the return message will be in LaTeX (Bibtex) format, and error messages and
comments will be in English.

Example 2:
- --------
mail lido@cs.uni-sb.de
Subject: lidosearch latex substring english

kobs natu"rlichspr

This message contains a single query only. All articles will be retrieved which
contain both the substring `kobs' (like in `Kobsa' or `Jakobson') and the
substring `natu"rlichspr'. The return message will come in LaTeX format,
and error messages and comments will be in English.

Example 3:
- --------
mail lido@cs.uni-sb.de
Subject: lidosearch substring english

morpholog(y|ie)
modell?ing
modell*ing
model+ing
ja[ck]obson
\<kobs

This message contains 6 queries which will yield articles containing the
following strings in the titles or authors' names (the output will come in
a refer-like format, and the comments will be in English):

Query 1: `morphology' or `morphologie' (German spelling)
Query 2: `modeling' or `modelling'
Query 3+4: `modeling', `modelling', `modellling', etc.
Query 5: `jacobson' or `jakobson'
Query 6: `kobs' at the beginning of a word (thus articles of Kobsa but
not of Jakobson are found).

Summary:

mail lido@cs.uni-sb.de
Subject: lidosearch [help][info] Sends this message
{[latex][nolatex]} Default: nolatex
{[substring][nosubstring]} Default: substring
{[english][deutsch]} Default: deutsch
Body of message:
Query pattern(s) of first query
Query pattern(s) of second query
:
:

Bugs: Very long words are truncated by the refer program which underlies the
'nosubstring' mode of LIDO. Theoretically it could therefore happen that
additional undesired articles are retrieved by the LIDO MAILSERVER in
this mode when long patterns are employed.

Good luck with your bibliographic search with LIDO!

- ------------------------------------------------------------------------------

REGULAR EXPRESSIONS

(egrep) (explanation)

_c a single (non-meta) character matches itself.
. matches any single character except newline.
? postfix operator; preceeding item is optional.
* postfix operator; preceeding item 0 or more times.
+ postfix operator; preceeding item 1 or more times.
| infix operator; matches either argument.
\< matches the empty string at the beginning of a word.
\> matches the empty string at the end of a word.
[_c_h_a_r_s] match any character in the given class; if the first character
after [ is ^, match any character not in the given class;
a range of characters may be specified by _f_i_r_s_t-_l_a_s_t;
for example, \W (below) is equivalent to the class [^A-Za-z0-9]
( ) parentheses are used to override operator precedence.
\_d_i_g_i_t \_n matches a repeat of the text matched earlier in the regexp
by the subexpression inside the nth opening parenthesis.
\ any special character may be preceded by abackslash to match it
literally.

(the following are for compatibility with GNU Emacs)
\b matches the empty string at the edge of a word.
\B matches the empty string if not at the edge of a word.
\w matches word-constituent characters (letters & digits).
\W matches characters that are not word-constituent.

Operator precedence is (highest to lowest) ?, *, and +, con-
catenation, and finally |. All other constructs are syntac-
tically identical to normal characters. For the truly
interested, the file dfa.c describes (and implements) the
exact grammar understood by the parser.