[tei-council] genetic draft -- big questions

Tue Aug 23 18:38:15 EDT 2011

This is a two-part response. First, I offer responses to the specific
questions Lou has posed. (I've concatenated, renumbered, and edited them
a bit to incorporate language/ideas that have come up in others'
responses.) Second, I've jotted down a bunch of other thoughts that have
occurred to me as I've been reading through both Lou's draft, "Preliminary
updates for genetic editing in P5" and the current chapter 11 of the
Guidelines.

Part 1:

> 1. Is this draft headed basically in the right direction? If not, what
are
> the essential things that must change in it?

I haven't gotten all the way through yet, but I don't have any reason to
worry yet.

> 2. Should we announce the existence of this draft (AS A DRAFT) to TEI-L
> now, or do some more fiddling with it first?

This question has already generated lots of good response, and if I
understand the general drift of that response correctly, I agree--that the
council hasn't ceased making progress and therefore should maybe
be allowed to press forward a while longer first.

> 3. The draft still uses the name "<facsimile>" for the element which
> contains an document-level tramnscription along with or independent of a
> collection of images. The workgroup specifically said that they didn't
like
> this name, and it is also confusing for those who are accustomed to the
> current P5 meaning for <facsimile> . What name shall we use instead?

I can live with sourceDoc, though I wish it didn't look so much like
sourceDesc, thus sort of inviting my brain to try to systematize the
semantics of "source" in a way that I think "isn't supported." But that's
a small quibble that one could probably make about lots of perfectly
good and serviceable existing element names.

> 4. Are people imagining something like
> 	a. (teiHeader, facsimile?, sourceDoc?, text?)
> 	b. (teiHeader, (sourecDoc?, text?)
> 	c. (teiHeader, facsimile?,  sourceDoc?, text?), with the proviso that
> 	facsimile is deprecated and will disappear eventually
> 	d. (teiHeader, (sourceDoc | facsimile)?, text?)

I'm not sure where the missing parenthesis is supposed to go in b,
but I definitely think we need to make facsimile available for the present,
so I wouldn't vote for it anyway. I like Sebastian's suggestion, d, with
the same proviso as in c, about facsimile being deprecated, etc., provided
we can answer yes to Sebastian's question: "Is it definitely possible to
use
<sourceDoc> in the same way as current <facsimile>, ie just with a load of
graphics for the zones and surfaces?"

> 5. transcription units: we propose using <line> <ab> or <seg>. What
> guidelines should we offer about when to choose which, beyond the
> following:
> - <seg> can be used within the other two to delimit any kind of unit
> - <line> implies a topographic line of some kind
> - <ab> implies any kind of block, possibly but not necessarily
> containing multiple <line>s

As others have said, I think we need to have a new element instead of a
repurposed <ab>. My first thought was also "block," though it doesn't
make me giddy. Like sourceDoc, it seems at least serviceable. Part? Bit?
Chunk? Morsel?

I was similarly worried about using <seg> in this new context, though I
can't point to any specific problem between the existing semantics and
the proposed use. So if others who have thought about it are unconcerned,
I will be, too.

> 6. discussion of spanTo needs expansion. Some more real examples would
> help.
>
> 7. Should <metaMark> possibly have a <desc> child to describe it rather
> than relying on brief characterisation as attribute values?

I'll respond to these after I've gotten to the relevant prats of the draft.

> 8. The graphics included should all have proper citation information. Do
we
> want so many? Can we find better (clearer) ones?

I'll look into the permissions question for the Whitman images, and,
depending on which ones folks think are keepers I can also work to make
them
clearer (using better sources, cropping, highlighting, whatever). I could
probably also hunt up different/better examples of some things, if anyone
has specific requests.

--------------

Part 2:

I'm not sure what, if any, of these comments will be welcome here. If they
should go somewhere else (or if I should just shut the hell up) I won't
mind being told so. For example, fresh from last week's discussion of
authority for making changes to the source, I wondered whether any of these
would be considered "obvious errors" that I would be OK correcting
silently.

Chapter 11

The <desc> for att.global.facs (at the top of p.346) and the <desc> for
att.coordinated (on both pp. 347 and 348) describe the attribute class in
terms of elements rather than attributes, as is the norm. I didn't look at
every other attribute class's <desc>, but of the ones listed under a-g on
the webpage version, only two others (att.enjamb and att.entryLike) are
similar. I realize that there's no real contradiction between saying that
an attribute class provides attributes for a particular set of concerns or
that an attribute class groups certain elements, but the first seems like
the more useful information, at least as the <desc>s are currently used in
the Guidelines.

Also, I'm not sure why the definition of att.coordinated is repeated in
such a short space. I would think that the two could be combined and all of
the attributes listed at once and then discussed one by one, in the way
that I believe the discussions of other attribute classes elsewhere are
organized.

@facs "may be used to associate any element in a transcribed text with an
image of it, by means of the usual URI pointing mechanism." This makes me
wonder what "usual" means. Usual when? To whom? If this is intended to
refer to something made explicit elsewhere in the Guidelines (the
definition of data.pointer in Appendix E, maybe?), an actual
cross-reference would seem in order.

> "By convention, this encoding indicates that the image indicated by facs
> attribute represents . . . ."

Seems to be missing "the" before "facs." I'd also rather avoid two such
close occurrences of "indicate"; perhaps "By convention, this encoding
implies that the image indicated by the facs attribute represents . . ."?

> "The recommended approach to encoding facsimiles is instead to use the
facs
> attribute in conjunction with the elements facsimile, surface, and zone,
> which are also provided by this module."

This sentence immediately follows a paragraph that basically acknowledges
that for some purposes (specifically, "many straightforward ‘digital
library’ applications") <pb facs=""> is just fine. Mixed message.

At the bottom of p. 346 we are told that <zone> "defines a rectangular area
contained within a surface element," and that restriction is repeated in
this prose at the bottom p. 352: "Zones need not nest within each other;
they must however be rectangular, as previously noted." At the top of p.
348, though, we are told that "A zone may be rectangular or
non-rectangular," and the way to define a non-rectangular zone is
specified. Also, this whole thing makes me wonder why <surface> must be
rectangular.

Preliminary updates document

At the top is a note saying that this is destined for insertion "after the
22nd para in section 11.1," which by my reckoning puts it between ". . .
which defines its coordinate space." and "In this example, . . . ." I'm
sure this must be wrong, but I'm curious what the correct place is. A
little bit farther down, there's a note about the placement of the Bovelles
example, but I wasn't able to quickly sort things out.

> "If, as is more often the case, a transcription of the zone identified in
> this way is to be included in the encoded document . . . ."

I'd favor "in the TEI document" here.

I'm pretty sure it's just a by-product of the way the current HTML
processing and not material, but I paused over the presence of pointy
brackets around some element names and not others.

> "This approach is illustrated in section ?? below."

Since Lou knows as well as or better than anyone that this is in the same
section (11.1), is the suggestion that it would be bumped to a new section?

> "Alternatively, if the transcription is intended to do no more than
> represent the physicality of the document itself . . . ."

Several things about this phrase puzzle me. Why "no more than"? The idea
that a transcription might represent physicality strikes me as at least
recondite--I'm not sure what to make of it without some unpacking. Finally,
in "the document itself" "document" seems ambiguous and I'm not sure what
"itself" is doing.

> "<patch> contains a part of a written surface which was originally
> physically distinct but became attached to it at the time that one or
more
> written zones were created."

Why "written"? Is the time of the creation referred to here the one that
took place in physical or digital space? What I'm starting to worry about,
I think, is that maybe "surface" is being used for both physical surfaces
and the ones created by the encoder. For this particular definition,
though, I'm now thinking that it's probably not helpful (even if it is
possible) to raise the issue of when the surfaces were created. Why does it
matter?

> "@binder	describes the method by which a patch is or was connected to
> the main surface".

I'm not wild about the name of the attribute, but I don't have anything
better to suggest. If it sticks, though, I suggest getting rid of "method."
One possibility: "material used to connect a patch to the main surface."

> "@flipping	indicates whether the patch is attached and folded in
such a
> way as to provide two writing surfaces".

I definitely dislike this attribute name. No perfect alternatives come to
mind, but could we at least avoid the participle--"flippable"? Also, the
description misses the mark (I think). To my mind, whether the thing is
folded is irrelevant, and from the encoder's/reader's perspective the
flippable-ness has nothing to do whether I can write on the other side. Of
the manuscripts I've worked with, lots of non-flippable patches have
writing on the other side, and some flippable ones are blank on the
reverse. I suggest something along the lines of "indicates whether the
patch is attached in such a way sas to allow it to be flipped (turned?
turned over?), bringing the reverse side into view."

> "<line> contains the transcription of a topographic line in the source
> document".

I could very well be getting loopy from spending too much time in editor
mode, but it bothers me a little that while I know what "a topographic
line" is supposed to mean (and I think "topographic" is pure genius), it's
not a line, really. It's something like "a series of characters vel sim.
[wink, wink] that share a region along the 'x' or 'y' or some other axis."

Brett

------------------
Brett Barney
Research Associate Professor
Center for Digital Research in the Humanities
University of Nebraska-Lincoln
bbarney2 at unl.edu
http://cdrh.unl.edu