[tei-council] facsimile odd

Fri Jul 20 12:52:43 EDT 2007

I'm returning to Lou's original commentary on the facsimile markup,
and I'll intersperse Conal's responses as well.

On 7/15/07, Lou Burnard <lou.burnard at computing-services.oxford.ac.uk> wrote:
> Unless I've grossly misunderstood it, Conal's proposal may be summarized
> as follows:
>
> a) we define a new element <pg>, a member of model.sourceDescPart
> b) we define a new attribute class, att.projection and make <graphic> a
> member of it, along with a small number of other existing "container"
> elements like <p>, <ab>, and <seg>. Dot also proposes an attribute @coords.
> c) we define a new element <area>, a member of model.graphicLike
> d) <pg> is used as a wrapper for one or more <graphic>s, each
> representing a page image; it can also contain <area>s which define
> particular zones within the page.
> e) <pg> can point into text transcript by means of special attribute
> @start (indicates a <pb/>); <area>s point to elements in the transcript
> using @corresp
>

Yes.

> And here, probably revealing the grossness of my understanding, are some
> comments on each of the above points:
>
> a) I don't think this element belongs in sourceDesc. If <pg> contains
> the  images constituting a digital facsimile, then it isn't metadata
> about that facsimile, it *is* the facsimile. I might want to record in
> the sourceDesc other things (e.g. where I nicked the images from) which
> wouldn't form part of the facsimile proper.

CT>But to my mind, the tei:pg elements ARE descriptions of the source
CT>material. I could be misinterpreting this though, I admit.

I'm with Conal on this, I think. sourceDesc is a description of the
source material. "Source material" could be a manuscript, a scroll, a
gravestone, what have you. A facsimile of the source is not the same
thing as the source - a facsimile is metadata about the source. I'd
say that <pg> does belong in sourceDesc, however this doesn't mean it
doesn't also make sense to have it at the same level as <text>.
Perhaps it should be allowed in both places? Similar to msDescription,
where you can have them both in the header and in the body.

> b) the class seems to combine two different kinds of attribute: ones
> like @top and @right which define where something else is within a
> graphic; and ones like @xscale and @rotate which define how a graphic is
> to be rendered in a given context. I really don't understand how these
> attributes are intended to be used though.

CT>You are right that this is perhaps overly complicated. The idea was to
CT>allow textual elements to be identified as being e.g. oriented vertically or
CT>diagonally on the page, etc.

CT>But a simplification would be to distribute the attributes
differently between
CT>the 2 attribute classes like so:

CT>att.projection should define scale factors and rotation. The members of this
CT>class should be the model.graphicLike elements which are children of the
CT>tei:pg elements. The projection attributes would define the mapping between
CT>the physical page and the various tei:graphic images of that page.

CT>att.coordinates would define the location of textual elements (or tei:area
CT>elements, which are the stand-off proxies of textual elements) within a page.

I have nothing to add to Conal's comments, this all makes sense to me.
I like his simplification.

> c) <area> doesn't make much sense except with reference to a <graphic>;
> it can't therefore be a member of model.graphicLike, since this would
> allow it to stand in place of a <graphic>

This is definitely an issue. I understand Conal's reply (see just
below) and my understanding is that the att.projection attributes, on
the <graphic>s that are children of <pg>, will map from the <area>s
(either children of <pg> or in the text) to the <graphic>s in the
header [or wherever <pg> ends up]. This system should also work in the
simple case where there is only one image per source. though I would
prefer a recommendation for @url on <pb> when there is only one image
per source.

CT>I would say that an <area> which was the child of a <figure>, for instance,
CT>would make sense with reference to the <graphic> images of the page,
CT>which are just the <graphic>s which are children of the <pg> which points
CT>to the <pb> which precedes the <figure>

CT>e.g. schematically:

CT><pg start="#p1">
CT> <graphic url="full-page-scan-of-p1.jpg"/>
CT></pg>

CT>...

CT><pb id="p1"/>

CT>...

CT><figure>
CT>  <area left="10" top="100" width="110" bottom="200"/>
CT>  <figDesc>A square picture of something</figDesc>
CT></figure>

> d) <pg> seems rather restrictive (not to say unpronounceable) as a name:
> could I use it, for example, to wrap images of Sebastian's gravestones?

I really don't like the name <pg> for this element. Syd, Martin and I
suggested changing this to <sourceImage> or <sourceMedia> but as Conal
replied to that suggestion:

CT>As I see it, the point of <pg/> is to
CT>model the flat surfaces on which texts are inscribed (call them pages
CT>or not).

We need to make it clear that this element is not only for book or
manuscript pages but also for inscribed surfaces, scrolls, stone
crosses, graffiti on walls.... anything with text written on it.
Conal suggests <surface/> or <plane/>, both of which are fine with me
and much better than <pg>, though on first thought I prefer <surface/>

> Is the only difference between a <pg> and an <area> that one corresponds
> with a conventional visual unit -- the page -- and the other with any
> arbitrary subsection of it? suppose each of my images shows a  2-page
> spread: would each one be a <pg> with each page image being an <area>?

To which Conal replied

CT>No, you'd have 2 <pg> elements, each containing the same tei:graphic/@url
CT>value, but the graphics would have different graphical offsets with
respect to the
CT><pg> which enclosed them. The right-hand page would have a big
negative @left
CT>offset to indicate that a vertical line down the centre of the
graphic corresponded
CT>to the left edge of the page.

Replacing <pg> with <surface> would mean that an editor would have to
make a decision on this point, I think. The "surface" of a 2-page
spread could be both pages, with <area> used to differentiate between
them, or one could say that <surface> is each individual page and
proceed as Conal suggests above. It seems to me that this really is an
editorial decision and the approach would depend on the type of source
material - a "true" 2-page spread vs. one page of text leading to
another page of text.

> e) why two different attributes for pointing into the text? How do I
> point from text into image?
>
Here you are referring to @start (points from <pg> to <pb>) and
@corresp (points from <area> to text). I don't recall why we have
@start in addition to @corresp on <pg>, unless it was simply to make
clear the 1:1 relationship between <pg> and <pb>.

One of the limitations of the proposal as I see it is that the
pointing is in one direction - from the head/facsimile into the
encoded text. To point from the text to the image would require
@corresp and perhaps @coords on elements in the text that point to the
xml:id of <pg> (or whatever we call it) groupings of <graphic>s in the
facsimile list. In cases where there is only one set of source images
with image files referenced directly on <pb> using @url this is even
simpler, and one only use @coords if one wishes to point to areas
within an image.

*****************
Now on to Lou's concrete suggestions

> I haven't got very far trying to answer these questions, but as far as I
> have I'd like to suggest that
> (1) we should be thinking of defining a different element to contain a
> collection of digital images, which would be analogous to  the existing
> <text> element: let's call it <facs> for the moment. A <facs> can appear
> where a <text> (but not a <floatingText>) can in the TEI model.

As stated above, I agree that <pg> is not a good element name and that
an equivalent grouping element should be available in place of <text>,
but it should also be available in the header (preferably in
<sourceDesc>) as a facsimile is metadata and not the source itself.

> (2) It contains one or more <zone> elements defining a two dimensional
> space which is represented in the facsimile
> (3) a <zone> contains one or more <graphic> elements, each of which
> gives a visual representation of the zone in question, using differing
> scales, rotations etc.
> (4) a <zone> may also optionally contain other <zone>s, each of which
> contains a visual representation of some subset of the parent zone,
> again possibly using different scales or rotations.

I'm not sure how I feel about <zone>, especially nested <zone>s. If we
are talking about representing regions of a page (surface) I would
much rather  talk about coordinate areas pointing to a <graphic> (or a
group of <graphic>s with some mapping system to ensure that the
coordinates point to the same regions of the different images) rather
than several different <graphic>s. This is pretty much what METS does
and it is also how <area> functions in this current proposal. Any
coordinate area one would want from a page could be identified as an
<area> - there isn't a need to nest anything.

Another option - which Conal has mentioned in other posts and which I
will say more about below - is to allow a @coords attribute directly
on elements in the <text> rather than having to rely on a separate
list of <areas> in the header/facsimile list.

> (5) alignment of image and text is done throughout using  existing TEI
> mechanisms -- i.e. we use @corresp to point from one to the other, or we
> use a standoff alignment map.
>
I agree that we should use existing TEI mechanisms whenever possible,
and one way to do that is to use @corresp to link from elements up/out
to <area> (or <zone>) elements that define the locations of the
content of those elements on the <pg> (or <surface>). But I do think
that we should have @coords allowed directly on elements in the <text>
(such as <damage>, <abbr>, <add>, <del>). One, people are already
doing this (Edition Production Technology does this, and I know of
several projects that are using this tool. If people are using the
tool, they will use whatever markup it requires). Second, it's
relatively simple, especially if your project only has one set of
image files. No need for separate indexes or lists. Third, @coords as
an attribute is already defined in METS (and I believe they take it
from HTML 4). So we aren't just inventing something (though that never
stopped us before.

> Please tell me if this is far too simplistic an approach -- it doesn't
> seem to me a million miles away from the proposal we have though.
>
Well, I'm really sorry about the length of my post and I hope some of
you are still awake  :-)  I think we have the makings of a great
recommendation and I'm very sorry I haven't been more vocal as this is
something I think is vitally important to include in P5 1.0.  I hope
my comments will be useful.

Dot

> Lou
>
>
>
> _______________________________________________
> tei-council mailing list
> tei-council at lists.village.Virginia.EDU
> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>

-- 
***************************************
Dot Porter, University of Kentucky
#####
Program Coordinator
Collaboratory for Research in Computing for Humanities
dporter at uky.edu          859-257-9549
#####
Editorial Assistant, REVEAL Project
Center for Visualization and Virtual Environments
porter at vis.uky.edu
***************************************