[tei-council] on regularizing names

Thu Sep 21 11:46:30 EDT 2006

> a) If it is a regular editorial correction, the usual <choice> is
> used, I assume: "it says Sebstian in the text, in fact that was a
> printers error and its Sebastian really"

This is error correction, nothing (or at least very little) to do
with regularization.

> b) If it says "Sebastian" in the text there and "SPQR" there, and
> you claim these are the same person, you'll point from each to a
> header record which gives the full name.

You are describing disambiguation of the *person*, not regularization
of the *name*, and in our world this is done with key=. It has
something to do with regularization, because we all presume that once
you have disambiguated the person, you have a regularized name
associated with that person. I.e., we presume that one doesn't
normally need to perform regularization and use key= simultaneously.

> What third situation are you talking about where you do inline
> normalization?

Encoders very commonly want to regularize names for purposes of
searching, sorting, and retrieval. In most cases, this could also be
accomplished by keying the names, thus disambiguation the person as
well as providing a regularized name. Disambiguating people, however,
is often an enormously expensive task that requires the significant
buy-in and time expenditure of scholars in the field. Conversely,
regularizing of names can usually be accomplished by a trained
undergrad.

Thus there is often a desire to provide regularization (which in P4
is accomplished in-line with reg=) without performing disambiguation.

That said, there are cases where regularization of the name and
disambiguation of the person are just completely different
activities, simply because there is not a 1:1 relationship between
people and names. Both "Mary Wollstonecraft Godwin" and "Mary Godwin
Shelley" refer to the same person, whereas the single name "Paul
Simon" may refer to either an ex-US Senator or a singer & songwriter.