[tei-council] text and image encoding

Mon Mar 19 08:00:33 EST 2007

I'm sorry to report that Dot and I have not made as much progress as we should have on this work item: http://tei.oucs.ox.ac.uk/trac/TEIP5/ticket/291

After the last teleconference I posted the ODD to the Wiki:
http://www.tei-c.org.uk/wiki/index.php/FacsimileMarkupODD

Since then we've been working on it sporadically, but we haven't yet updated the Wiki with anything new. 

I have mostly integrated the discursive text into the ODD (locally), and I expect to have finished that tomorrow-ish, and upload it.

I've also been working on making test instance documents by generating them from PDF using the open source pdftohtml tool.

We have been looking at the Image Markup Tool and looking at the Edition Production Technology system, to see how they relate to the draft. The current draft has two mechanisms each of which correspond pretty well to these two different schemes.

The IMT schema essentially defines regions of images (using svg) and uses div elements as annotations of those regions, linking from the div/@n to the svg:rect/@id. This is a bit implicit and informal, but structurally it maps closely to our draft: we have a region element which is the equivalent of a rect. In our schema, an annotation could be done as a note linking to the region. 

The EPT system is more about linking text to graphical images of the text, and that also maps pretty well, using the @left, @right, etc attributes.