[tei-council] encoding page scans

Thu Dec 15 17:38:52 EST 2005

Hi Conal,

Some comments on your comments...

On 12/14/05, Conal Tuohy <Conal.Tuohy at vuw.ac.nz> wrote:
> Dot Porter wrote:
>
<snip/>
> > depends on @coords, which can be added to any tag and which indicates
> > where the tag contents reside on the image. Unfortunately this only
> > allows for one set of coordinates, even if there are multiple image
> > files for the same page. For another project, I've been working on a
>
> The same problem as above then - the links should go from the image to
> the text, rather than the other way? Or we could use link/@targets to
> encode n-ary links, i.e. a single link element indicating a
> correspondence between some text markup and 1 or more graphics (or
> regions within graphics)? The weakness of link/@targets for multiple
> links is that it doesn't attach any semantics to the different links.
>

Oh, I don't really like this idea for the very same reason you give -
I'd rather have a linking system where the relationships between the
text and image are defined, rather than just noted. (I've actually
stopped using linkGrp/link completely... I use METS instead...)

> > system for encoding multiple sets of coordinates in a METS Structural
> > Map (stored in a separate file, rather than as a wrapper for the TEI),
> > and it seems to work pretty well.
>
> I'm sure it does :-) and I've read the METS profiles for doing it[1]
> though I've never done it myself ... but do you think it should really
> be necessary to go to these lengths?

I rather hope it isn't *necessary* but it does seem to work - though
it is incredibly intensive on the encoding side. I've just built a
small sample of material, using oXygen, and taking coordinates from
EPT and PhotoShop. It would be impossible to do more without special
software support.

<snip/> The main thing, to my
> mind, is that the guidelines should clearly document some standard
> practice that doesn't involve treating page scans as figures, or making
> custom extensions to the TEI schema.
>
yes, yes, yes

> Dot, could you post a little example of the METS and TEI markup you are
> using to associate an image file with a TEI page?
>

Boy, okay. The project is an edition of the Venetus A manuscript - the
earliest surviving copy of the Iliad, containing the main text plus
several layers of annotation.

Much simplified: First, we encode the main text and the annotations
separately (in separate documents, one document per book). In the main
text, the poetic lines (<l>) are assigned IDs based on the book and
line number - for example, Book 4 line 537 has xml:id="book.4.537".
IDs for the annotations are assigned based on the type of annotation
(marginal, interlinear, etc.), book number, the group number
(annotations are in groups of two or more), and then the number of the
specific annotation in a group. So an ID for a group of marginal
annotations would be xml:id="m.4.15", while the first annotation in
the group would be xml:id="m.4.15.1".

In the METS file, there are three file groups: two for the TEI files
and one for the facsimile scans. Each file is assigned its own ID. We
then use structural maps to link the text with the corresponding
image. In this example, we note the location (using COORDS) of the
first group of annotations in book 4 on the image:

<structMap LABEL="Marginal Scholia">
  <div LABEL="Book 4">
    <div ID="Am.4.1">
      <fptr>
        <area FILEID="id-2001.01.0092" COORDS="368,842,1600,1656"/>
      </fptr>
      <fptr>
        <area FILEID="id-sch-4" BEGIN="Am.4.1"/>
      </fptr>
    </div>

FILEID links up to the file groups; COORDS records the coordinates on
the image file; BEGIN records the xml:id of the line in the TEI file
(multiple lines would have both BEGIN and END).

I also have a .pdf presentation that I put together in the hopes that
I wouldn't completely confuse the non-tech editors. The .zip file also
includes a sample METS file:
http://www.rch.uky.edu/Venetus/Venetus-METS.zip

> > The UVic Image Markup Tool[2] uses
> > SVG within the TEI body to link to both file and coordinates;
>
> That's an interesting example. I agree that inline SVG could be a good
> way to mark up regions within a graphic. Just a quibble, though: in your
> example, is the link between the graphical region and the text
> represented by a purely conventional correspondence between the @n and
> the @xml:id attributes? In the guidelines it suggest using a ptr to
> provide a TEI-namespaced proxy for the SVG region, then linking the ptr
> and the text markup with a TEI <link> element.
>
Well unfortunately as it is now, the system that the UVic IMT uses
doesn't actually link the graphical region and text on anything other
than the page-level - it's really a tool (and a markup system) for
annotating image areas, rather than a tool for linking images to text.
I think that the concept of combining SVG with TEI has applications
for the question at hand, though.

> > another
> > approach might be to have a TEI module for incorporating image files
> > and their areas in a project.
>
> This last is the approach I think I would prefer. One of my criteria for
> such a feature would be that in the simplest case it should be dead easy
> to associate a page image with a page. Whereas the METS approach is
> perfectly capable, but it's probably not so convenient for encoders, I
> would guess.

I would not call it "convenient", no. But I've had fun with it.

> Having a separate file is an extra hassle, for a start,

Yes.

> though perhaps a "METS" TEI module could be produced which would add
> METS as a root element, and introduce the TEI element embedded in the
> METS structure map?

This is interesting. Or a METS structural map in the TEI Header?

Could the same be done with SVG? That could be a way
> to at least provide recommended encoding mechanisms defined using the
> standard TEI customisation practice.
>

I don't know. But I do think that we'll be better off if we do take a
good look at what's already out there for describing images (METS, SVG
- others?) and get together a group of people who have been thinking
about these issues. So to Sebastian's last question - I think a work
group would be a useful step.

Talk to y'all tomorrow!

Dot

> Cheers
>
> Con
>
> [1] http://www.loc.gov/standards/mets/profiles/00000005.xml
> _______________________________________________
> tei-council mailing list
> tei-council at lists.village.Virginia.EDU
> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>

--
***************************************
Dot Porter, Program Coordinator
Collaboratory for Research in Computing for Humanities
University of Kentucky
351 William T. Young Library
Lexington, KY  40506

dporter at uky.edu          859-257-9549
***************************************