[tei-council] sex confession

Syd Bauman Syd_Bauman at Brown.edu
Thu Oct 20 09:57:12 EDT 2005


Last night I got a few datatype fixes into P5 just before Sebastian
went off to make the released version and press CDs for the Members'
Meeting. Of those half a dozen or so fixes, I goofed one: sex= of
<personGrp>. I not only didn't I fix it, I made it worse.

It had a <valList> that didn't match the datatype. I have no idea
what kind of drugs I was on, but in my haste rather than do the right
thing (simply delete the <valList>), I changed the <valList> to more
closely match the datatype. So now, instead of matching sex= of
<person>, the sex= attribute of <personGrp> is idiotic. Ignore it.
It will be fixed in CVS by this time tomorrow.

However, it did raise an important issue about this particular
datatype.

Problem
-------
We agreed to use ISO 5218 codes for the value of sex= of <person> and
<personGrp>:
    * 0 = not known,
    * 1 = male,
    * 2 = female,
    * 9 = not specified.
However, <personGrp> (at least as P4 conceives it) requires an
additional value: "mixed" -- the group contains people of different
sexes.

Solutions
---------
Some possible solutions are:
1. Add a new numeric code, say "7", that means "mixed", to data.sex.
2. Use
     attribute sex { data.sex | "mixed" }
   for sex= of <personGrp>
3. Change the datatype so that the values are "not known", "male",
   "female", and "not specified", and then use #2.
4. Remove "mixed" from the possible values of sex= of <personGrp> and
   either
   a) say "you can't say that -- tough", or
   b) say "if sex= is *not* specified, it is presumed to be mixed"
5. Remove sex= from <personGrp> entirely -- if you want to know,
   check the children[1]

At the moment I am leaning towards #2, at least until those who work
on our "prosopography" work on the issue. With this solution the
datatype still maps cleanly to ISO, and it is very clear which values
of sex= of <personGrp> are from ISO (numeric codes) and which are
made up by TEI (contain letters).


Note
----
[1] The following fragment of XSLT 1.1 does the job iff there are
    only male and female children. One could obviously extend this to
    cover the "not known" and "not specified" cases as well.

  <xsl:template match="personGrp">
    <xsl:variable name="sex">
      <xsl:choose>
        <xsl:when test="count(./person[@sex='1']) = 0 and count(./person[@sex='2']) > 0">
          <xsl:text>all female</xsl:text>
        </xsl:when>
        <xsl:when test="count(./person[@sex='1']) > 0 and count(./person[@sex='2']) = 0">
          <xsl:text>all male</xsl:text>
        </xsl:when>
        <xsl:when test="count(./person[@sex='1']) > 0 and count(./person[@sex='2']) > 0">
          <xsl:text>mixed company</xsl:text>
        </xsl:when>
      </xsl:choose>
    </xsl:variable>
    <xsl:message>
      personGrp # <xsl:value-of select="@n"/> is <xsl:value-of select="$sex"/>.
    </xsl:message>
  </xsl:template>

   Of course, one could go a lot further:

        <xsl:when test="count(./person[@sex='1']) > 0.6 * count(./person)">
          <xsl:text>mostly male</xsl:text>
        </xsl:when>
        <xsl:when test="count(./person[@sex='2']) > 0.6 * count(./person)">
          <xsl:text>mostly female</xsl:text>
        </xsl:when>

   (BTW, I'm not claiming this is *good* XSLT code, just proof of
   concept.)




More information about the tei-council mailing list