[tei-council] facsimile draft

Thu Aug 9 05:00:02 EDT 2007

Conal Tuohy wrote:
> Regarding <att>box</att>, I agree with the others who've suggested that the origin of our coordinate spaces should be at the upper left, and that positive values are to the right, and below, the origin. That's pretty conventional in my experience. I like the name of <att>box</att>! 
>   
OK
> Can you comment on the new facsimile/front and facsimile/back - what would be encoded there?
>
>   
Same as <front> and <back> elsewhere -- I was imagining that you might 
want to associate some text or other with the facsimile itself, distinct 
from front and back of the transcription. It might also provide a home 
for some additional image-specific metadata if we decide not to put that 
in the header.

> My main set of issues relates to the changes about how graphics relate to zones, and to surfaces, and how textual elements relate to zones or graphics. This is quite a different system - can you explain the rationale behind the changes? 
>
> In the previous draft, it was possible to assign graphical coordinates to individual elements in a transcription, but this is now dropped. What was the reason for that? I am pretty sure that Dot was particularly keen on that feature, and I was also convinced of its utility. 
The problem with the way this was done before (and I may have 
misunderstood) is that you had to have special rules telling you which 
<graphic> the @coords supplied for it were relative to (see earlier 
discussion with Sebastian).  URL pointing solves this by the concept of 
xml:base, which might have been a good solution, if only there were an 
xpointer location scheme for locating boxes within the graphic that we 
all agreed on. We could define our own but I felt we ought to learn our 
lesson from last time and wait for someone else to do it!

> For instance, imagine a TEI transcript originating in an OCR process, which would have image coordinates assigned to each word by the OCR software. Using your draft markup, if I understand correctly, it would be necessary to create a distinct zone element for each word, essentially a parallel of the transcription, and link each word in the transcript to its corresponding zone. This would be quite an overhead!
>   

Why? It's an automated process anyway, isn't it? You would indeed need 
to define a zone for each word, but wouldn't  that be precisely what the 
OCR process outputs anyway?

> I also think that the value space for @facs is too loose - in the sense that a <p> or a <div> could use a @facs pointer to point to either an image file, to a zone, or to a graphic. I have a feeling this is not going to be so convenient for processing. In the previous draft, the idea was that such links would be ONLY to zones, which were facsimile equivalents of <anchor> elements in a transcription. 
>
>   
We can't enforce this kind of rule (even for <anchor>s) -- it's a 
data.pointer and it can point anywhere it pleases. I felt it was useful 
to spell out what it *means* when it points to different kinds of thing. 
An application can of course choose not to support a particular class of 
target, but that's a different issue.

> You've also allowed <graphic> inside <zone>, and I'm having a hard time understanding the rationale for this change. It seems to be of a piece with the change to remove <graphic> from att.coordinated. Now, since a graphic has no @box of its own, it inherits one from its parent <zone>, is that right? 
<graphic> inside <zone> means the same as <graphic> inside <surface> 
(you may recall that I wanted to use <surface> recursively) -- this is 
an image of the zone/graphic defined here, so yes: the bounding box of 
the graphic/s inside a <zone> are defined by the parent zone.

I wanted to avoid change to <graphic>, if at all possible. And I also 
wanted to separate the co-ordinate information from the graphical 
pointing information.

>  In my previous draft, a graphic had a @box (or @coords as it was still called) attribute of its own, and hence didn't need to be enclosed in a zone, and I don't see why we'd want to wrap those graphics in zones, when they could just have their own @box. What does that gain us?
>
>   
A clearer separation of concepts, imho. Plus the ability to give 
multiple graphic realisations for the same space in a relatively 
non-prolix manner.

> Requiring graphics to be contained in zones would be convenient to the extent that the distinct graphics correspond exactly to areas of interest (i.e. if they have been exactly cropped to that size), but I'm not sure this is likely to be a common case. It seems to me more likely that graphics will tend to be larger than zones, in almost all cases. Hence there would need to be an analytical zone (highlighted the area of interest) and a graphical zone (to contain a graphic which showed the area of interest). Only if the graphic had been cropped to exactly cover the area of interest could its parent zone be accurately used as an analytical zone, and linked to a piece of transcript. 
>
>   
This depends on how you define the zone/surface, surely?

> Removing graphic from zone (and giving graphic its own @box) would mean that zones would be always empty, and this would simplify processing, too, I believe.
>
>   
Because empty elements are easier to process than full ones? I find that 
hard to believe!

> Regarding the "short-cut" which allows facsimile/graphic instead of requiring facsimile/surface/graphic, this seems reasonable, though I wonder if there's much prospect of people using this short-cut, and if not, I think the shortcut should be abolished (to simplify processing). The reason I doubt it would be popular is that if you have a single graphic, you already have the option of linking to it directly from a pb, which is an even shorter short-cut. If you use the facsimile/graphic shortcut (i.e. a graphic as a direct child of facsimile, rather than mediated by a surface), you don't have the option of using zones anyway, so this slightly-longer shortcut doesn't cater for any distinct use case as far as I can see).
>   
See my comment to Dan before breakfast. It seems a good idea to have a 
clear distinction between graphics in the text and graphics representing 
the text. It seems a good idea to have a place where all the information 
about the latter can be collected together. But I agree it won't seem 
that short a cut to people who just want to pepper their transcriptions 
with explicit pointers off into the wild blue yonder with no concern for 
the morrow... such people are probably beyond help anyway.

> In short, I'm a bit flummoxed. I liked the linking better the way it was.
>
>   

Well, fair enough. I apologize if you feel I've messed up your ideas 
completely, and am very grateful for your willingness to engage in the 
debate. I think there's been a pretty convincing groundswell of approval 
for the direction things are going so the process can't be all bad.

> Con
>
> -----Original Message-----
> From: tei-council-bounces at lists.village.Virginia.EDU on behalf of Lou Burnard
> Sent: Mon 06/08/07 9:07
> To: tei-council at lists.village.Virginia.EDU
> Subject: [tei-council] facsimile draft 
>  
> As mentioned in the call, I've been working on trying to produce a 
> section about facsimile markup which could be plugged into the current 
> chapter on physical transcription, using as many as possible of the 
> ideas discussed here by Conal and others over the last few weeks.
>
> Time is running out, and we need to get closure on this, so I hope Conal 
> and Dot will excuse me for steaming ahead on this without consulting 
> with them privately first. I've used the documents circulated and 
> followed (as far as I can) the discussion so far to produce a 
> straw-person kind of a draft which is now posted for your (particularly 
> their) urgent attention at http://www.tei-c.org/Drafts/facs.odd
>
> I've deliberately restricted the scope of what this draft makes possible 
> to what I hope we can all agree on as a bare minimum of functionality. 
> It supports linking from text to image and image to text with a minimum 
> of fuss ; it also supports linking between text and image fragments, but 
> only provides one way to do it. It tries to fit in with existing TEI 
> idiom and practice.
>
> It is however in desperate need of help on the following counts:
>
> -- I haven't the faintest idea how to transcribe the  Old English ms 
> we're using as an example. (The one Conal circulated earlier). Either 
> someone needs to transcribe it for me, or I need to find another example 
> which I can transcribe. (Actually, as this one claims to be copyright of 
> the Bodleian, the second is probably the wiser course)
>
> -- in defining how the co-ordinate system works, I have had to rely on 
> my vague recollections of O level maths. Someone who actually knows 
> about this stuff should read it carefully to see how plausible this is. 
> Also how feasible it is to implement it!
>
> -- I've also made a wild guess about how to specify the datatype of my 
> @box attribute (formerly known as @coords -- I renamed it because it is 
> considerably more restricted than the synonymous XHTML attribute)
>
> All comments, bouquets, and brickbats gratefully received
>
> Lou
>
>
>
> _______________________________________________
> tei-council mailing list
> tei-council at lists.village.Virginia.EDU
> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>
>
>