[tei-council] on regularizing names

Thu Sep 21 09:13:30 EDT 2006

Hi Syd,

Nice set of choices. I prefer (j), pointer to a regularization, simple
case rules. I've become a fan of providing regular information in the
header, then pointing to it from inside the text, for consistency. I
fear that relying on <reg> in this context could lead to inconsistent
regularization. I appreciate that it would be simpler for the
programmer to require a @key on <persName> rather than on <regName>,
but here again I like the idea of only having to do it once. But
otherwise, (i) is my second choice.

Dot

On 9/21/06, Syd Bauman <Syd_Bauman at brown.edu> wrote:
> Back in May I posted a discussion of regularization of names
> [http://lists.village.virginia.edu/pipermail/tei-council/2006/001353.html].
>
> I reproduce that list of possible solutions, with four more added
> here. One of the new additions to the list is that which Julia &
> Perry recommended in their work of 2005-07
> [http://lists.village.virginia.edu/pipermail/tei-council/2005/000600.html],
> another two are simplifications of that.
>
> I will then go through the list, and show that many suggestions are
> problematic at best, ending with a set of 7 choices for Council to
> consider.
>
> some possibilities
> ---- -------------
> a) <reg> on a par w/ the PCDATA inside name:
>      <persName>Syd
>        <reg>Bauman, Sydney D.</reg>
>      </persName>
>
> b) <reg> with a sister element inside name:
>      <persName>
>        <ZZZ>Syd</ZZZ>
>        <reg>Bauman, Sydney D.</reg>
>      </persName>
>    where ZZZ could be "literal", "asIs", "diplomatic", "transcribed"
>    or some such -- if it is "orig", then this is same as (e)
>
> c) names *in* <choice>:
>      <choice>
>        <persName>Syd</persName>
>        <reg>Bauman, Sydney D.</reg>
>      </choice>
>
> d) <choice> in names:
>      <persName>
>        <choice>
>          <orig>Syd</orig>
>          <reg>Bauman, Sydney D.</reg>
>        </choice>
>      </persName>
>
> e) name *is* <choice>, as it were:
>      <persName>
>        <orig>Syd</orig>
>        <reg>Bauman, Sydney D.</reg>
>      </persName>
>
> f) Sorry, no gaiji and no other languages in your
>    regularizations:
>    <persName reg="Bauman, Sydney D.">Syd</persName>
>
> g) Sorry, no gaiji, but use another attribute to represent a
>    different language:
>    <persName regLang="es" xml:lang="en"
>              reg="Bia, Alejandro">Alex</persName>
>
> h) Pointer to a regularization and/or a pointer:
>    This is the method Julia & Perry recommended.
>    <persName reg="#reg.sb">Syd</persName>
>    <!-- meanwhile, in header or elsewhere: -->
>    <regName xml:id="reg.sb"
>             authority='LCNAF"
>             target="http://authorities.loc.gov/cgi-bin/Pwebrecon.cgi?AuthRecID=56..."
>             >Bauman, Sydney D.</regName>
>
> i) Pointer to a regularization, optional key=:
>    Use the basic gist of Julia & Perry's suggestion, but rather than
>    permit a pointer to an authority instead of content (which makes
>    the distinction between regularizing a name and disambiguating a
>    person a bit fuzzy), say that, like <persName>, the content of
>    <regName> is required, and is a *name*. Furthermore, like
>    <persName>, <regName> can bear a key= attribute. E.g.:
>       <p>In the 1940s he was known as
>       <persName reg="#reg25">Ritchie</persName>, but most of us
>       know him as <persName reg="#reg26">Ringo</persName>.</p>
>       <!-- meanwhile, in header or elsewhere: -->
>       <regName key="url:http://www.imdb.com/name/nm0823592/"
>                xml:id="reg25">Starkey, Richard</regName>
>       <regName key="url:http://www.imdb.com/name/nm0823592/"
>                xml:id="reg26">Starr, Ringo</regName>
>
> j) Pointer to a regularization, simple case rules:
>    As above, but don't permit key= on <regName>
>
> k) Pointer to a regularization, using <persName>: Rather than create
>    a special element <regName>, use <persName> inside some special
>    element in the tei Header (<nameList>, <list type="regularNames">,
>    some such).
>
>
> analyses
> --------
> (q) rubs most people (including me) the wrong way, because the
>     information is not in parallel structures. In some way, we think
>     of the PCDATA content of an element as being different than, on a
>     different level than, the nested element's content.
>     But more importantly, this method (like several others) makes it
>     all but impossible for software to reliably extract either the
>     source name or the regularized name. This is because of the
>     inherent difficulty differentiating
>       <persName>Barr., J<reg>Barrington, Jonathan U.</reg></persName>
>     from
>       <persName>Barr., <reg>J</reg></persName>
>
> (b) makes the above differentiation possible, if painful:
>     <xsl:if test="./ZZZ">
>       <xsl:value-of select="./reg"/>
>     </xsl:if>
>
> (c) seems perfectly reasonable to me. Does bring us back to the
>     content of <choice> problem, though. (What would
>     <choice><name/><orig/><corr/><reg/></choice>
>     mean?)
>
> (d) seems cumbersome, but requires no change to our schemas, just to
>     our prose & examples. However, it would be quite hard if not
>     impossible for software to differentiate
>       <persName>
>         <choice>
>           <orig>John</orig>
>           <reg>Barrington, John U.</reg>
>         </choice>
>       </persName>
>     from
>       <persName>
>         <choice>
>           <orig>Iohn</orig>
>           <reg>John</reg>
>         </choice>
>       </persName>
>
> (e) runs into trouble because both <orig> and <reg> are already
>     permitted as children of <name>. It would be quite hard to
>     differentiate
>       <persName>
>         <orig>Barr., J. V.</orig>
>         <reg>Barrington, Jonathan U.</reg>
>       </persName>
>     from
>       <persName>Barr., <orig>J</orig>. <reg>V</reg>.</persName>
>     (Not that anyone actually uses <orig> and <reg> like that, but we
>     don't want to rely on no one wanting to do that.)
>
> (f) is unacceptable. The main reasons to move something from an
>     attribute to an element is to be able to use gaiji within it and
>     to be able to say what natural language it's in. There is no
>     excuse to wanting a gaiji in a regularized name (if it's not in
>     Unicode, it's not a regularization, e.g., it couldn't be sorted
>     by any standard algorithm). However, there is every reason to
>     want to have regularizations in a different language than the
>     source. So (f) is out.
>
> (g) tries to solve the problem (f) runs into, but this violates the
>     explicit semantics of xml:lang=. This is a limitation we tied
>     ourselves to when we agreed to use xml:lang= and not tei:lang=,
>     and here we pay the consequences of that decision by not being
>     able to use (g).
>
> (h) has some strong advantages. However, the optional dual-pronged
>     approach both leaves the "am I pointing to a name or a person"
>     question a little fuzzy (but that can be dealt with by defining
>     the semantics clearly) and makes it harder for software to
>     actually find the regularized name.
>
> (i) solves the fuzziness problem. reg= always points to a *name*,
>     which is nothing more than a regularization of a *name*. key=
>     always refers to a database record (possibly by pointing), which
>     is about a *person*; it is quite possible that said record has no
>     information other than a regularized name, of course. Note that
>     software needs to look in 2 places to try to find this database
>     record key, though: key= of <persName>, and if not there then the
>     key= of the <regName> pointed to by the reg= of <persName>.
>
> (j) solves the "two places to look" problem for the programmer by
>     forcing the encoder to put key= on each occurrence of a <persName>
>     she wants keyed, rather than allowing the indirection of
>     specifying the key= once on the <regName>.
>
> (k) takes advantage of the fact that <persName> already has the
>     content model you would want if you care about the inner details
>     of the name, and already has a key= attribute. The disadvantage is
>     that we would still have to create a special element, and that
>     <persName> would also bear a reg= attribute, which would be silly
>     when it was used in this context.
>
>
> Thus, I think there are only 7 viable solutions for Council to
> consider, listed here in my (current) personal order of preference:
>   (c): names *in* <choice>
>   (i): pointer to a regularization
>   (j): pointer to a regularization, no key=
>   (b): <reg> with a sister element inside name
>   (h): pointer to a regularization and/or a pointer
>   (k): pointer to another <persName>
>   (e): name *is* <choice>, as it were
>
> _______________________________________________
> tei-council mailing list
> tei-council at lists.village.Virginia.EDU
> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>

-- 
***************************************
Dot Porter, University of Kentucky
#####
Program Coordinator
Collaboratory for Research in Computing for Humanities
dporter at uky.edu          859-257-9549
#####
Editorial Assistant, REVEAL Project
Center for Visualization and Virtual Environments
porter at vis.uky.edu
***************************************