[tei-council] facsimile - how to do stand-off facsimile markup?
Conal Tuohy
Conal.Tuohy at vuw.ac.nz
Thu Jul 26 10:36:38 EDT 2007
The other requirement we've tried to cover in our proposal is the ability to add facsimile markup to an existing transcript in a stand-off way. Rather than adding @coords to each <p>, <head>, <ab> or whatever, the coordinates would be attached to another element, representing the area on the page, and that area would be linked with @corresp (or some other pointer mechanism) to the textual elements which fall within that area.
This is the function of the <area> element in our proposal. I would be happy to rename this <zone>, which actually accords with the technical term used by people who are using OCR to produce this kind of facsimile edition, I believe.
For example:
A single-page document with 2 facsimile images, one of which is 10x the resolution of the other.
-----------------------------------
<facsimile>
<surface start="#p1">
<graphic url="p1.jpg" scale="1"/>
<graphic url="p1-thumbnail.jpg" scale="10"/>
</surface>
</facsimile>
...
<text>
...
<pb xml:id="p1"/>
...
<!-- the word "Foo" occupies the square whose top corner is (10,10) and
whose bottom corner is (20,20) -->
<ab coords="10 10 20 20">Foo</ab>
...
</text>
-----------------------------------
This could be equivalently encoded in a stand-off style like so:
-----------------------------------
<facsimile>
<surface start="#p1">
<graphic url="p1.jpg" scale="1"/>
<graphic url="p1-thumbnail.jpg" scale="10"/>
<!-- the word "Foo" occupies the square whose top corner is (10,10) and
whose bottom corner is (20,20) -->
<area coords="10 10 20 20" corresp="#foo"/>
</surface>
</facsimile>
...
<text>
...
<pb xml:id="p1"/>
...
<ab xml:id="foo">Foo</ab>
...
</text>
-----------------------------------
I think that Martin Holmes also wanted to be able to link such <zone> elements to milestoneLike elements in the text, in order to be able to more easily handle cases where paragraphs were split over columns or pages. I was thinking that <zone> could have @start and @end attributes pointing at milestones, to be used as an alternative to @corresp. But it's late and I'll leave this until tomorrow.
Finally, I'm open-minded regarding the need for a different attribute to @corresp - I suggested @fax earlier and Lou suggested @facs, and I think Syd also preferred using an attribute with distinctly "facsimilar" semantics, rather than the multi-purpose @corresp.
More information about the tei-council
mailing list