[tei-council] personography issues

Conal Tuohy Conal.Tuohy at vuw.ac.nz
Thu Sep 7 03:44:28 EDT 2006


I'm working on a project which exports biographical data out of a database into TEI P5, and I have a few questions about the new personographic elements. I'm sorry this email is so long!

I'll start by describing what I'm trying to achieve:

In general I want to be able to relate pieces of TEI to external authorities, in order to be able to harvest rich metadata from them. For instance, it's all very well to use a particular value of a "type" attribute, but unless you can link this value to something external to the document, it's not always possible to be sure that it means the same thing as when some other TEI document uses that same value. In my particular case, the biographical database contains a fairly sophisticated data model, and I want to be able to represent that model fully in TEI, and later be able to extract that data out of the TEI into other data structures, such as a semantic net, as Richard Light described in the email which Lou reposted to TEI-L recently.
http://listserv.brown.edu/archives/cgi-bin/wa?A2=ind0608&L=tei-l&P=11897

One of my requirements is to retain all the information from the DB, and indeed to document the relationship between the TEI markup and the DB tables and records. I want to explicitly refer to the database tables, and to use codes in the TEI which correspond to database keys. For this I've chosen to use taxonomies (using bibl to refer to the external database), and within each category to nest <name> elements whose key attributes equal the keys of those database tables.

The people in the biographical database I'm working with are classified in several ways, including with a 2-level taxonomy which is mostly of occupations. I initially mapped this taxonomy to a TEI taxonomy, and I used the <occupation> element to link each person to the occupations listed in the taxonomy. Later I realised that although most of the taxa in the database were occupations, some were not (e.g. "Racist"). As far as I can tell, those which are not occupations are all personal traits and could be modelled as <persTrait> or more specifically in some cases as <faith>.

I had hoped to represent all these classifications in TEI using a single generic markup which would cover everything. However, although there's a model.assertLike, there's currently no <assert> element. So my remaining option is to classify the taxa drawn from the database and relate them to specific tei elements such as <faith> and <occupation>, so that they can each be represented with the most specific TEI markup available.

I have had another look at the personographic (model.assertLike) elements and I want to raise a few questions:

1) uniquely, occupation is not a member of att.datable. Was that intended?

2) occupation and socEcStatus have 2 pointer attributes: a "scheme" (pointing to a taxonomy element) and a "code" (pointing to a category element within the taxonomy). It seems to me that "scheme" is redundant, since the "code" identifies the category and also implicitly identifies the taxonomy (i.e. the ancestor element of the category). Does that make sense? 

3) Some elements lack a "type" attribute which I think could be useful. e.g. affiliation doesn't have a type attribute. So it's possible to say that the person with key "xxx" is affiliated to the organisation with key "yyy", but it doesn't seem possible to formally declare and identify the type of affiliation (member, life member, secret member, president, etc) in the TEI. However, because affiliation is a member of att.naming, the tei:affiliation can be linked to an external record by a key attribute, and this external record could provide a type. As well as affiliation, I could see a possible case for adding a type attribute to education ("primary", "secondary") and even to residence ("holiday", "permanent"). Makes sense?

4) The <relation> element has a @type (e.g. "social", "personal", "other") and also a @name attribute which further defines the type of relationship. It seems to me that this "name" attribute is pretty much equivalent to the "subtype" attribute defined by the att.typed class, and I wonder whether, for consistency's sake, relation shouldn't just be a member of att.typed, and lose the "name" attribute? 

5) In any case, it would be nice if it were possible to document this value somewhere, i.e. to link it to a taxonomy or other formal declaration. The note in the guidelines for the att.typed class says that @type and @subtype may be formally defined in the classification declaration (and incidentally, the note refers spuriously to a <classification> element instead of - I assume - <classDecl>). 

	The typology used may be formally defined using the the <classification> element of the <encodingDesc> 
	within the associated TEI header, or informally as descriptive prose within the <encodingDesc> element.
		from http://www.tei-c.org/release/doc/tei-p5-doc/html/ref-att.typed.html

Is the documentation intended to mean that the values of the "type" attributes should correspond to the xml:id attributes of category elements in the classDecl? I think it would be good to be explicit about how this formal definition is supposed to work. In particular, I would like to be able to unambiguously, and in a machine-readable way, indicate that a "type" attribute is formally defined. Can anyone tell me how this is supposed to work?

If this isn't feasible, then I'd like to use an attribute whose type is a pointer (to a category) instead, as occupation and socEcStatus do. 

I've had to resort to some awkward encoding for some things e.g. since <persTrait> and <affiliation> have "key" attributes rather than points to categories, if I want to keep a record of what the key means, I have added a taxonomy with categories containing <name> elements with matching keys. If those keys had been pointers I could have pointed directly to a category element. Also, although TEI keys are intended to refer to external database keys etc, there seems no way to specify which database a key corresponds to - in effect there's a single namespace of keys. In the case where you have to refer to multiple database tables, you need to be able to refer to which table they key belongs to (something like the "scheme" and "code" attributes used to point at tei categories). 

	key	provides a means of locating a full definition for the entity being named such as a database record key or URI.

e.g. I have a database record which asserts that person X is a member of a tribe (or "iwi" as they are called here) whose database key is "934", and whose name (in the db) is "Ngāi Tahu (South Island)". I can represent this as:

<persTrait key="iwi-934">
	<label>Tribal Affiliation</label>
	<p>Ngāi Tahu (South Island)</p>
</persTrait>

or

<affiliation key="iwi-934">Ngāi Tahu (South Island)</affiliation>

or

<affiliation><orgName key="iwi-934">Ngāi Tahu (South Island)</orgName></affiliation>

I can document that the information came from the database table something like this:

<taxonomy id="tribes">
	<bibl>Tribes. Derived from the DNZB database table <title>tblTribe</title></bibl>
	<category>
		<catDesc><name key="iwi-934">Ngāi Tahu (South Island)</name></catDesc>
	</category>
</taxonomy>

I'd like to also be able to link "iwi-934" to an external authority. The National Library of NZ maintains such an authority at http://iwihapu.natlib.govt.nz/ (a "hapu" by the way is a sub-tribe).

<taxonomy id="iwihapu.natlib.govt.nz">
	<bibl><ref url="http://iwihapu.natlib.govt.nz/">Iwi-Hapū Names List</ref></bibl>
	<category>
		<catDesc><name key="iwi-934">Kāi Tahu</name></catDesc>
	</category>
</taxonomy>

The idea of using keys in this way is to declare an equivalence between categories in different taxonomies.

NB "Kāi Tahu" is the name used by the tribe itself, and it's the "official" name of the tribe. 

It would be nice, though, if the category element itself had a key attribute:
	
<affiliation key="iwi-934">Ngāi Tahu (South Island)</affiliation>

<taxonomy id="tribes">
	<bibl>Tribes. Derived from the DNZB database table <title>tblTribe</title></bibl>
	<category key="iwi-934">
		<catDesc>Ngāi Tahu (South Island)</catDesc>
	</category>
</taxonomy>

<taxonomy id="iwihapu.natlib.govt.nz">
	<bibl><ref url="http://iwihapu.natlib.govt.nz/">Iwi-Hapū Names List</ref></bibl>
	<category key="iwi-934">
		<catDesc>Kāi Tahu</catDesc>
	</category>
</taxonomy>

I would be interested to hear if anyone has an answer to my numbered questions. I'm sorry if this email seems rambly and unfocused ... I confess I'm a bit flummoxed by the whole thing, though I think what I have is very nearly satisfactory. If anyone had any suggestions of alternative approaches I would also appreciate hearing them.

Regards

Con





More information about the tei-council mailing list