more on Micro-OCP and hyphens (61)

Willard McCarty (MCCARTY@VM.EPAS.UTORONTO.CA)
Sat, 25 Feb 89 23:20:55 EST


Humanist Mailing List, Vol. 2, No. 652. Saturday, 25 Feb 1989.

Date: 25 Feb 89 14:51:54 EST (Sat)
From: Daniel Ridings <ridings@hum.gu.se>
Subject: Hyphens and Micro-OCP

Just so that there is no misunderstanding. I am including an exact
text file and two .CTL files (command files).
------------------------------------text file-----------------------------
text text text text dia-
fulassoi text text text xrw-
ntai text text text text text.
------------------------------------CTL file 1----------------------------
*input
text hyphen "-".
*words
alph "A=a B=b C=c D=d E=e F=f G=g H=h I=i J=j K=k L=l M=m N=n O=o P=p
Q=q R=r S=s T=t U=u V=v W=w X=x Y=y Z=z".
*action
pick words "DIA* XRW*".
do concordance.
*go
-------------------------------------CTL file 2----------------------------
*input
text hyphen "-".
*words
alph "A=a B=b C=c D=d E=e F=f G=g H=h I=i J=j K=k L=l M=m N=n O=o P=p
Q=q R=r S=s T=t U=u V=v W=w X=x Y=y Z=z".
*action
pick words "DIAFUL* XRWN*".
do concordance.
*go
-------------------------------------END------------------------------------
The first CTL file finds the two instances. The second CTL file fails to
find anything. I might add that both of these files give correct results
with the version of OCP we run on our Data General mini (version 1.4).
This is a serious problem. Now that classical scholars have a
wealth of material to work with, thanks to Thesaurus Linguae Graecae,
we have a tendency to work with large files. When working with large files
it is to our advantage to narrow down our search criteria. This problem
turned up when one of our researchers wanted to find all of the forms
of "diafulassei, diafulassoi etc" in Plutarch. We were lucky. This instance
that Micro-OCP missed happened to be a case he knew of. It would be
unreasonable to have to search for "dia*"---and get all the prepositions
in Plutarch, ca 7 Mbytes---when "diaful*" would have done nicely.
So, the problem is not with references between the two halves
of a hyphenated word. The problem is using wild-card searches on words
that are hyphenated. Please don't get me wrong. Micro-OCP is an excellent
help. My only wish is to warn of the possible pitfalls. The most
reasonable solution for the time being would be to dehyphenate all words
in a text. With Greek this is simple as hyphens are never part of a word
but in English text this could be a problem, eg,
text text text text counter-
charge text text text.