Martin Mueller
Mon Apr 18 20:22:44 EDT 2011

I'm inclined to think that the progress of XML editors like oXygen has
changed the cost/benefit calculus of algorithmic and human editing. Ten
years ago the entry barrier to editing any XML text was pretty high. Now
continuous validation and a whole lot of other little things add up to an
environment where it is a matter of hours to teach willing undergraduates
(or other amateurs) enough oXygen tricks to lick rough-cut texts into
something like final shape within a time frame that does not vary greatly
from the time it takes for a careful reading of the text.

I'm aware of the optimism in that statement. On the other hand, if you
spent time (who has it?) on good tutorials, documentation, and all that
other boring stuff, you could get somewhere. It's a form of
"crowdsourcing" that would depend on "directing the crowd" in ways that
rewards volunteers by making them feel they are doing something useful
(almost) right away.

So there will always be a lot of "eyeballing."  But progress could happen
if two conditions are met:

1. There are people with domain knowledge who see what is wrong with this
version of Ivanhoe or that version of Der Schwierige
2. The inconsistencies are to some extent controllable so that the domain
experts, amateurs, and would-be fixers would have a lot of guidance in how
to tackling the problems the computer can't

What you want ultimately is a distributed editing platform for corpus-wide
collaborative curation of texts that have been brought up to some
reasonably predictable rough-cut form through algorithmic processes.

>Like Lou, I have done this several times, and met the same problem of
>inconsistency - very hard to resolve without eyeballing.
>I will be very interested to check mj's script
