[tei-council] facsimile - how to do stand-off facsimile markup?

Conal Tuohy Conal.Tuohy at vuw.ac.nz
Thu Jul 26 10:36:38 EDT 2007


The other requirement we've tried to cover in our proposal is the ability to add facsimile markup to an existing transcript in a stand-off way. Rather than adding @coords to each <p>, <head>, <ab> or whatever, the coordinates would be attached to another element, representing the area on the page, and that area would be linked with @corresp (or some other pointer mechanism) to the textual elements which fall within that area. 

This is the function of the <area> element in our proposal. I would be happy to rename this <zone>, which actually accords with the technical term used by people who are using OCR to produce this kind of facsimile edition, I believe.

For example:

A single-page document with 2 facsimile images, one of which is 10x the resolution of the other.
-----------------------------------
<facsimile>
   <surface start="#p1">
      <graphic url="p1.jpg" scale="1"/>
      <graphic url="p1-thumbnail.jpg" scale="10"/>
   </surface>
</facsimile>
...
<text>
   ...
   <pb xml:id="p1"/>
   ...
   <!-- the word "Foo" occupies the square whose top corner is (10,10) and
   whose bottom corner is (20,20) -->
   <ab coords="10 10 20 20">Foo</ab>
   ...
</text>
-----------------------------------

This could be equivalently encoded in a stand-off style like so:

-----------------------------------
<facsimile>
   <surface start="#p1">
      <graphic url="p1.jpg" scale="1"/>
      <graphic url="p1-thumbnail.jpg" scale="10"/>
      <!-- the word "Foo" occupies the square whose top corner is (10,10) and
      whose bottom corner is (20,20) -->
      <area coords="10 10 20 20" corresp="#foo"/>
   </surface>
</facsimile>
...
<text>
   ...
   <pb xml:id="p1"/>
   ...
   <ab xml:id="foo">Foo</ab>
   ...
</text>
-----------------------------------

I think that Martin Holmes also wanted to be able to link such <zone> elements to milestoneLike elements in the text, in order to be able to more easily handle cases where paragraphs were split over columns or pages. I was thinking that <zone> could have @start and @end attributes pointing at milestones, to be used as an alternative to @corresp. But it's late and I'll leave this until tomorrow.

Finally, I'm open-minded regarding the need for a different attribute to @corresp - I suggested @fax earlier and Lou suggested @facs, and I think Syd also preferred using an attribute with distinctly "facsimilar" semantics, rather than the multi-purpose @corresp.





More information about the tei-council mailing list