[tei-council] facsimile odd

Conal Tuohy Conal.Tuohy at vuw.ac.nz
Tue Jul 17 07:56:08 EDT 2007


Lou Burnard wrote:

> Unless I've grossly misunderstood it, Conal's proposal may be summarized 
> as follows:
> 
> a) we define a new element <pg>, a member of model.sourceDescPart
> b) we define a new attribute class, att.projection and make <graphic> a 
> member of it, along with a small number of other existing "container" 
> elements like <p>, <ab>, and <seg>. Dot also proposes an attribute @coords.

That's correct. 

The <pg> element represents a physical page.

The att.projection attributes define the location of something in terms of a coordinate space. 

In the case of tei:area, the att.projection attributes would define a rectangular area on a page, which would be represented by a tei:pg element. Typically the tei:pg would be the parent element of the tei:area, but because tei:pb can be linked to tei:pg, even elements within the body of the text have an applicable tei:pg (namely the tei:pg which is linked to the tei:pb which precedes the element).

Pretty much every other element usable within tei:text COULD be a member of att.projection, if the encoder needed to assign it a location on the page. In the sample ODD, we've merely added tei:p, tei:ab and tei:seg. Again, the coordinates of these elements are to be interpreted as locations on the page represented by the tei:pg which applies to the page break preceding those elements.

> c) we define a new element <area>, a member of model.graphicLike
> d) <pg> is used as a wrapper for one or more <graphic>s, each 
> representing a page image; it can also contain <area>s which define 
> particular zones within the page.
> e) <pg> can point into text transcript by means of special attribute 
> @start (indicates a <pb/>); <area>s point to elements in the transcript 
> using @corresp

All true.

> 
> And here, probably revealing the grossness of my understanding, are some 
> comments on each of the above points:
> 
> a) I don't think this element belongs in sourceDesc. If <pg> contains 
> the  images constituting a digital facsimile, then it isn't metadata 
> about that facsimile, it *is* the facsimile. I might want to record in 
> the sourceDesc other things (e.g. where I nicked the images from) which 
> wouldn't form part of the facsimile proper.

I'd be open to this ... could it go in a new teiHeader child: facsimileDesc?

But to my mind, the tei:pg elements ARE descriptions of the source material. I could be misinterpreting this though, I admit.

> b) the class seems to combine two different kinds of attribute: ones 
> like @top and @right which define where something else is within a 
> graphic; and ones like @xscale and @rotate which define how a graphic is 
> to be rendered in a given context. I really don't understand how these 
> attributes are intended to be used though.

You are right that this is perhaps overly complicated. The idea was to allow textual elements to be identified as being e.g. oriented vertically or diagonally on the page, etc. 

But a simplification would be to distribute the attributes differently between the 2 attribute classes like so:

att.projection should define scale factors and rotation. The members of this class should be the model.graphicLike elements which are children of the tei:pg elements. The projection attributes would define the mapping between the physical page and the various tei:graphic images of that page.

att.coordinates would define the location of textual elements (or tei:area elements, which are the stand-off proxies of textual elements) within a page. 

> c) <area> doesn't make much sense except with reference to a <graphic>; 
> it can't therefore be a member of model.graphicLike, since this would 
> allow it to stand in place of a <graphic>

I would say that an <area> which was the child of a <figure>, for instance, would make sense with reference to the <graphic> images of the page, which are just the <graphic>s which are children of the <pg> which points to the <pb> which precedes the <figure>

e.g. schematically:

<pg start="#p1">
  <graphic url="full-page-scan-of-p1.jpg"/>
</pg>

...

<pb id="p1"/>

...

<figure>
   <area left="10" top="100" width="110" bottom="200"/>
   <figDesc>A square picture of something</figDesc>
</figure>

> d) <pg> seems rather restrictive (not to say unpronounceable) as a name: 
> could I use it, for example, to wrap images of Sebastian's gravestones? 

I agree. But yes, it should cover gravestones as well.

> Is the only difference between a <pg> and an <area> that one corresponds 
> with a conventional visual unit -- the page -- and the other with any 
> arbitrary subsection of it? suppose each of my images shows a  2-page 
> spread: would each one be a <pg> with each page image being an <area>?

No, you'd have 2 <pg> elements, each containing the same tei:graphic/@url value, but the graphics would have different graphical offsets with respect to the <pg> which enclosed them. The right-hand page would have a big negative @left offset to indicate that a vertical line down the centre of the graphic corresponded to the left edge of the page.

> e) why two different attributes for pointing into the text? How do I 
> point from text into image?d

You can either assign coordinates to fragments of text, or link them to area elements.



More information about the tei-council mailing list