[tei-council] Responses to Primary Sources #1 (up to the end of 11.1)

Martin Holmes mholmes at uvic.ca
Thu Nov 24 13:42:42 EST 2011

This is my first batch of feedback on the Primary Sources chapter draft.

First, I should say that this is becoming an excellent piece of work, 
and although I have many quibbles and criticisms, I don't want to appear 
merely critical. This is what I have so far:



Repetition of "for example". Recommend substituting "such as" for the 
second instance:

"It may sometimes contain a variety of images of the same source pages, 
for example of different resolutions, or of different kinds. Such a 
collection may form part of any kind of document, for example a 
commentary of a codicological or paeleographic nature, where there is a 
need to align explanatory text with image data."


Superfluous "And" at the beginning of this sentence, especially since 
"also" is present:

"And it may also be complemented..."


In this sentence:

"These elements make it possible to accommodate multiple images of each 
page, as well as to record arbitrary planar coordinates of textual 
elements on any kind of written surface and to link such elements with 
digital facsimile images of them."

I don't believe that we need the word "textual"; it implies (to me at 
any rate) that non-textual elements on the page cannot be identified by 
<zone>s. Suggest either deletion of the word, or "textual or other 


The description of sourceDoc depends on the phrase "dossier génétique". 
I think this should be glossed in English. I don't know what it should 
be glossed with, of course.


In this sentence:

"Either of the facsimile and sourceDoc elements may be used to represent 
a digital facsimile."

I maintain that "and" should be "or".


The first example of mapping coordinate spaces, using the Karlsruhe 
image, is pointlessly complicated. First, we create a <surface> whose 
coordinate space is not identical with the graphic we're working with, 
then create a <zone> which is larger than the <surface>, with the 
graphic inside the <zone>. Why such complexity? The simplest case would 
be to create a <surface> whose coordinate space is 0, 0, 500, 321, and 
then use <zone>s to define the spaces of interest (the left and right 
pages). My argument is not that it's wrong to do what's currently in the 
document; it's that it's a rather abnormal way to proceed, it's too 
complicated for the first example of <surface> and <zone>, and that it 
will be off-putting and confusing for readers. I suggest the first 
example should work like this (using the same page-image):

      <zone ulx="37" uly="16" lrx="230" lry="293" xml:id="k95v"></zone>
      <zone ulx="232" uly="16" lrx="416" lry="293" xml:id="k96r"></zone>

In other words, a single surface coterminous with the graphic, with two 
zones established on it, one for each page.

If necessary, this simple example could be re-worked into the way it 
currently appears, with the addition of a good explanation of why one 
might do this, but I think the simple example needs to come first; it 
will be what most people want to do, and will be all that many people 
actually need. In fact, the Bovelles example which is carefully 
worked-out below starts from this simple approach, but many readers will 
not get that far because they will stumble at the first, more confusing, 


In Figure 3, Zones within a surface, the added zone boundaries should be 
in a different colour from the original image, so it's clear to the 
reader that they are an artifact of the encoding, not part of the 
original page.


The first example of using the @points attribute, on the Bovelles image, 
is pointlessly complicated:

     points="4.88147,31.0344 5.46483,30.7339 5.58857,32.2011 
5.85374,32.8022 6.10123,33.4386 5.53554,33.7744 5.11128,33.3679 

Why not just use integers here? Nothing is gained by five decimal 
places, other than to slightly intimidate the reader. Most uses of 
@points will use whole numbers (based on pixels within the image, below 
which there is little purpose in descending).




In the Bovelles transcription, which links with the image further up the 
page, the zone "B49rHead" is defined to contain both <head> elements 
that appear at the beginning of the <div> (including "Chapitre 
septiesme"). However, in the transcription example, only the first 
<head> is linked to that <zone> using @facs. I suggest that either:

	- The transcription be modified to contain a single <head>, so it can 
be unambiguously linked to the <zone>, or

	- The image of zones be modified to split that zone into two, so that 
each can be linked to its appropriate <head>.



This whole section, which is tiny, seems superfluous to me. Its contents 
have already been covered above ("a legal TEI document may thus comprise 
any of the following : ..."), and if the explanation above is 
insufficient, it should be expanded so this section can be deleted. 
Another way of looking at this section is that it comprises the 
introduction to 11.1.3, in which case it should be folded into it.




In this sentence:

"An embedded transcription is one in which words and other written 
traces are encoded as subcomponents of elements representing the 
physical surfaces carrying them rather than independently of them. "

I recommend a comma after "carrying them".


This sentence might not be true:

"Equally, the encoder may choose to provide only graphics without any 
transcription, to provide only a structured (non-embedded) 
transcription, or to provide any combination of the three."

I don't think <facsimile> + <sourceDoc> + <text> is actually allowed, is 
it? If it is, then it needs to be included in the list further up the 
page ("a legal TEI document may thus comprise any of the following : ").


More in a bit...


Martin Holmes
University of Victoria Humanities Computing and Media Centre
(mholmes at uvic.ca)

More information about the tei-council mailing list