13.0012 e-rating

Humanist Discussion Group (humanist@kcl.ac.uk)
Thu, 13 May 1999 20:33:44 +0100 (BST)

Humanist Discussion Group, Vol. 13, No. 12.
Centre for Computing in the Humanities, King's College London
<http://www.princeton.edu/~mccarty/humanist/>
<http://www.kcl.ac.uk/humanities/cch/humanist/>

Date: Thu, 13 May 1999 20:35:26 +0100
From: Frank Hubbard <6615hubbardf@marquette.edu>
Subject: e-raters

Willard,

This is an update and some backtracking on my post from February 8, about
ETS and the e-rater which is scoring GMAT exams now.

First, the backtracking. My first post said "as far as we know now" about
how e-rater works. That says something I did not intend, which was that I
had some knowledge from ETS of how e-rater works. Grader and scoring
leader or no, I was not and am not privy to knowledge of that kind. People
who have scored for ETS, and we number in the thousands, will agree that
our work does not require that we know such things, and so we don't. We
apply scoring guides and sample papers to candidates' papers, and seek
consensus.

I didn't know what was in e-rater, as I hope my mention of getting more
information, and my concluding questions, will support. I knew that the
e-rater would be used on applicant-written arguments and analyses of
arguments, and that it would in some way try to parallel what human readers
do. So I should have said, "the techniques I know about from style
checkers, machine translation, style studies as I have encountered them in
court cases, and so on." The list I gave has offended, because only one of
the items, what I called "collocation," could even loosely be said to
figure directly in the papers Jill Burstein has published on e-rater (see
Mary Dee Harris's reply to me, February 16, both for these and for a
caution about revealing how e-rater actually works).

The update: e-rater is up and running. As far as I know, it is doing as
well as Mary Dee Harris said it would. Yes, it has reduced the hours that
scorers work on GMAT, and the benefit is faster return of scores. I do not
know whether there have been glitches, or what the plans are for e-rater's
development. I do not know, for instance, whether e-rater will go to work
on the GRE writing tasks when those are installed next fall.

Finally, my remark about how I regard large-scale testing has offended, so
I would like to offer some explanation. Every writing teacher knows, and I
am sure test-makers, academic program directors, and applicants themselves
know, that what a writing student can do on a timed test is not the same as
what that student can do with time to re-examine and revise. The short,
one-time test measures something, and that something is a useful ability.
But it is not the only ability academic program directors should care
about. And we hope it is not the only ability test-takers think we care
about, although, as Dr. Harris pointed out, it wouldn't hurt to have
test-takers care about structure, vocabulary, and so on.

The trouble with a numerical score given for that ability is not the
test-makers' doing, but is rather the trouble created by people like me who
work with programs, that the number is so easy to use that we rely on it
too much, which is opposed to what test-makers actually tell us we should
do. Even though the written scripts of student answers may be available,
they aren't much used, as far as my conversations with directors suggest.
So the tests are "very necessary," as I said, "evil" because of what can be
done with them, and my effort with them--to persuade programs not to rely
on them alone, and with ETS to increase the possible feedback to the
writer--tries to keep fast assessment on the one hand, and on the other
teaching and learning, in balance.

Frank Hubbard
Marquette University

-------------------------------------------------------------------------
Humanist Discussion Group
Information at <http://www.kcl.ac.uk/humanities/cch/humanist/>
<http://www.princeton.edu/~mccarty/humanist/>
=========================================================================