[tei-council] on regularizing names
Dot Porter
dporter at uky.edu
Thu Sep 21 09:13:30 EDT 2006
Hi Syd,
Nice set of choices. I prefer (j), pointer to a regularization, simple
case rules. I've become a fan of providing regular information in the
header, then pointing to it from inside the text, for consistency. I
fear that relying on <reg> in this context could lead to inconsistent
regularization. I appreciate that it would be simpler for the
programmer to require a @key on <persName> rather than on <regName>,
but here again I like the idea of only having to do it once. But
otherwise, (i) is my second choice.
Dot
On 9/21/06, Syd Bauman <Syd_Bauman at brown.edu> wrote:
> Back in May I posted a discussion of regularization of names
> [http://lists.village.virginia.edu/pipermail/tei-council/2006/001353.html].
>
> I reproduce that list of possible solutions, with four more added
> here. One of the new additions to the list is that which Julia &
> Perry recommended in their work of 2005-07
> [http://lists.village.virginia.edu/pipermail/tei-council/2005/000600.html],
> another two are simplifications of that.
>
> I will then go through the list, and show that many suggestions are
> problematic at best, ending with a set of 7 choices for Council to
> consider.
>
> some possibilities
> ---- -------------
> a) <reg> on a par w/ the PCDATA inside name:
> <persName>Syd
> <reg>Bauman, Sydney D.</reg>
> </persName>
>
> b) <reg> with a sister element inside name:
> <persName>
> <ZZZ>Syd</ZZZ>
> <reg>Bauman, Sydney D.</reg>
> </persName>
> where ZZZ could be "literal", "asIs", "diplomatic", "transcribed"
> or some such -- if it is "orig", then this is same as (e)
>
> c) names *in* <choice>:
> <choice>
> <persName>Syd</persName>
> <reg>Bauman, Sydney D.</reg>
> </choice>
>
> d) <choice> in names:
> <persName>
> <choice>
> <orig>Syd</orig>
> <reg>Bauman, Sydney D.</reg>
> </choice>
> </persName>
>
> e) name *is* <choice>, as it were:
> <persName>
> <orig>Syd</orig>
> <reg>Bauman, Sydney D.</reg>
> </persName>
>
> f) Sorry, no gaiji and no other languages in your
> regularizations:
> <persName reg="Bauman, Sydney D.">Syd</persName>
>
> g) Sorry, no gaiji, but use another attribute to represent a
> different language:
> <persName regLang="es" xml:lang="en"
> reg="Bia, Alejandro">Alex</persName>
>
> h) Pointer to a regularization and/or a pointer:
> This is the method Julia & Perry recommended.
> <persName reg="#reg.sb">Syd</persName>
> <!-- meanwhile, in header or elsewhere: -->
> <regName xml:id="reg.sb"
> authority='LCNAF"
> target="http://authorities.loc.gov/cgi-bin/Pwebrecon.cgi?AuthRecID=56..."
> >Bauman, Sydney D.</regName>
>
> i) Pointer to a regularization, optional key=:
> Use the basic gist of Julia & Perry's suggestion, but rather than
> permit a pointer to an authority instead of content (which makes
> the distinction between regularizing a name and disambiguating a
> person a bit fuzzy), say that, like <persName>, the content of
> <regName> is required, and is a *name*. Furthermore, like
> <persName>, <regName> can bear a key= attribute. E.g.:
> <p>In the 1940s he was known as
> <persName reg="#reg25">Ritchie</persName>, but most of us
> know him as <persName reg="#reg26">Ringo</persName>.</p>
> <!-- meanwhile, in header or elsewhere: -->
> <regName key="url:http://www.imdb.com/name/nm0823592/"
> xml:id="reg25">Starkey, Richard</regName>
> <regName key="url:http://www.imdb.com/name/nm0823592/"
> xml:id="reg26">Starr, Ringo</regName>
>
> j) Pointer to a regularization, simple case rules:
> As above, but don't permit key= on <regName>
>
> k) Pointer to a regularization, using <persName>: Rather than create
> a special element <regName>, use <persName> inside some special
> element in the tei Header (<nameList>, <list type="regularNames">,
> some such).
>
>
> analyses
> --------
> (q) rubs most people (including me) the wrong way, because the
> information is not in parallel structures. In some way, we think
> of the PCDATA content of an element as being different than, on a
> different level than, the nested element's content.
> But more importantly, this method (like several others) makes it
> all but impossible for software to reliably extract either the
> source name or the regularized name. This is because of the
> inherent difficulty differentiating
> <persName>Barr., J<reg>Barrington, Jonathan U.</reg></persName>
> from
> <persName>Barr., <reg>J</reg></persName>
>
> (b) makes the above differentiation possible, if painful:
> <xsl:if test="./ZZZ">
> <xsl:value-of select="./reg"/>
> </xsl:if>
>
> (c) seems perfectly reasonable to me. Does bring us back to the
> content of <choice> problem, though. (What would
> <choice><name/><orig/><corr/><reg/></choice>
> mean?)
>
> (d) seems cumbersome, but requires no change to our schemas, just to
> our prose & examples. However, it would be quite hard if not
> impossible for software to differentiate
> <persName>
> <choice>
> <orig>John</orig>
> <reg>Barrington, John U.</reg>
> </choice>
> </persName>
> from
> <persName>
> <choice>
> <orig>Iohn</orig>
> <reg>John</reg>
> </choice>
> </persName>
>
> (e) runs into trouble because both <orig> and <reg> are already
> permitted as children of <name>. It would be quite hard to
> differentiate
> <persName>
> <orig>Barr., J. V.</orig>
> <reg>Barrington, Jonathan U.</reg>
> </persName>
> from
> <persName>Barr., <orig>J</orig>. <reg>V</reg>.</persName>
> (Not that anyone actually uses <orig> and <reg> like that, but we
> don't want to rely on no one wanting to do that.)
>
> (f) is unacceptable. The main reasons to move something from an
> attribute to an element is to be able to use gaiji within it and
> to be able to say what natural language it's in. There is no
> excuse to wanting a gaiji in a regularized name (if it's not in
> Unicode, it's not a regularization, e.g., it couldn't be sorted
> by any standard algorithm). However, there is every reason to
> want to have regularizations in a different language than the
> source. So (f) is out.
>
> (g) tries to solve the problem (f) runs into, but this violates the
> explicit semantics of xml:lang=. This is a limitation we tied
> ourselves to when we agreed to use xml:lang= and not tei:lang=,
> and here we pay the consequences of that decision by not being
> able to use (g).
>
> (h) has some strong advantages. However, the optional dual-pronged
> approach both leaves the "am I pointing to a name or a person"
> question a little fuzzy (but that can be dealt with by defining
> the semantics clearly) and makes it harder for software to
> actually find the regularized name.
>
> (i) solves the fuzziness problem. reg= always points to a *name*,
> which is nothing more than a regularization of a *name*. key=
> always refers to a database record (possibly by pointing), which
> is about a *person*; it is quite possible that said record has no
> information other than a regularized name, of course. Note that
> software needs to look in 2 places to try to find this database
> record key, though: key= of <persName>, and if not there then the
> key= of the <regName> pointed to by the reg= of <persName>.
>
> (j) solves the "two places to look" problem for the programmer by
> forcing the encoder to put key= on each occurrence of a <persName>
> she wants keyed, rather than allowing the indirection of
> specifying the key= once on the <regName>.
>
> (k) takes advantage of the fact that <persName> already has the
> content model you would want if you care about the inner details
> of the name, and already has a key= attribute. The disadvantage is
> that we would still have to create a special element, and that
> <persName> would also bear a reg= attribute, which would be silly
> when it was used in this context.
>
>
> Thus, I think there are only 7 viable solutions for Council to
> consider, listed here in my (current) personal order of preference:
> (c): names *in* <choice>
> (i): pointer to a regularization
> (j): pointer to a regularization, no key=
> (b): <reg> with a sister element inside name
> (h): pointer to a regularization and/or a pointer
> (k): pointer to another <persName>
> (e): name *is* <choice>, as it were
>
> _______________________________________________
> tei-council mailing list
> tei-council at lists.village.Virginia.EDU
> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>
--
***************************************
Dot Porter, University of Kentucky
#####
Program Coordinator
Collaboratory for Research in Computing for Humanities
dporter at uky.edu 859-257-9549
#####
Editorial Assistant, REVEAL Project
Center for Visualization and Virtual Environments
porter at vis.uky.edu
***************************************
More information about the tei-council
mailing list