[tei-council] linking text to image, facsimiles -- what to do?

Lou Burnard lou.burnard at computing-services.oxford.ac.uk
Thu Jul 26 05:09:25 EDT 2007


I am short of time today (have to write a paper for a conference this 
weekend) so I am just going to respond very quickly to what I see as the 
major points of Syd's posting.

1. an image does not have multiple readings

In my sense of the word "reading" it does. For example, a black and 
white image is a different "reading" from a colour one. You can also 
talk about a reading such as "woman holding baby" as distinct from 
"madonna and child". These two different kinds of difference nicely 
parallel the sorts of difference we are well used to in encoding text 
e.g. "this is a placename" vs "this is a reference to Botswana"

2. Why are we the TEI doing this?

Because two members of the Council have put a lot of work into it, 
reflecting their perceptions of the needs of the Community. Because 
there is ample evidence that users of the TEI want to produce facsimile 
editions within one single encoding framework. Because one of the great 
things about digitization/encoding is breaking down media-specific 
barriers.

3. image information in the header?

Here at least we seem to be in agreement: it doesn't belong in the 
header. It doesnt belong in the body either though, since an image *in* 
the text is not the same as an image *of* the text. So we do need some 
kind of <facsimile> type element, analogous to <text>.

4. what's wrong with mets?

Nothing, (except that I dont know much about it). But nowhere has anyone 
proposed anything about re-inventing a place for special metadata about 
images have they? The idea is to invent a home for the images. If it's 
outside the header (vide sup) then you can put your image metadata 
inside the header if you want to, or have it wherever mets puts it. 
Unless I'm mistaken, part of the original charge for  Conal and Dot was 
to consider whether or not we should hand over the entire problem to 
METS, and their conclusion, which the Council agreed to lo these many 
many months ago was that we shouldn't. If you want to reverse that 
decision you need to come up with a rather more convincing case, and 
some specific alternatives.

5. don't use corresp

I dont feel very strongly on this one. seems to me @corresp has exactly 
the right semantics, but it's not inconceivable that you might also want 
to use it for say a translation, in which case you'd be stuck. So let's 
invent another attribute -- how about @facs -- for this specific kind of 
  correspondence.

6. don't do clever image mapping stuff

I am open minded on this. I'm willing to trust to C&D's expertise to get 
this right eventually; they've thought about it more than I have 
certainly, and it's not their fault that the TEI editors have been too 
slow in reacting to their ideas and helping them develop them further.

I stand by the oft repeated assertion that we *must* have something in 
P5 1.0 on this topic. We won't be crucified for failing to include a 
special tag for postscripts. We will be for not recognising that people 
want to produce digital objects containing page images as well as 
transcriptions.

Must dash...

Lou

Syd Bauman wrote:
> First, some thoughts on specifics:
> 
> Whatever we end up doing, we should not be using corresp= as part of
> a system for linking text to images. In fact, I don't think we should
> make any specific recommendations that rely on the use of corresp=.
> (Examples of it being used are fine, of course.) It is a general-
> purpose attribute whose specific semantics should generally be left
> to the user. In any case where it would be useful to suggest it as a
> mechanism in the Guidelines, we should probably be coming up with a
> specific-purpose attribute, instead.
> 
> 
> various> [facsimiles should be in <sourceDesc>]
> LB> [No, it should not]
> LB> The syllogism seems to be: ... distinct from the source it
> LB> purports to represent.
> 
> I agree with everything Lou has said up to this point in his posting
> pretty emphatically. A facsimile is not metadata about the source,
> and even if it were, it isn't a *description* of the source in any
> way, shape, or form, and thus does not belong in <sourceDesc>.
> 
> 
> LB> In fact, it seems to me the relationship between a transcription
> LB> of a source and the original is almost identical to the
> LB> relationship between a facsimile of it and the original.
> 
> I am not nearly as convinced of this statement. I am convinced that a
> transcription requires interpretation. While I am open to discussion
> on the subject, my gut instinct is that a scanned page image does
> not. However, I'm not sure we need to agree on this philosophical
> issue to move forward.
> 
> 
> LB> One translates a reading letter by letter, and the other
> LB> translates a reading dot by dot, but they are both readings and
> LB> as such I would like to give them the same ontological standing
> LB> in my encoding.
> 
> Perhaps I am misunderstanding what you mean here, Lou, or not
> understanding some nuance, but this seems at least wrong, if not
> blasphemous. I don't think of a facsimile (i.e., a page image) as a
> reading at all -- it is read, but it is not a reading. An encoded
> transcription is one or more readings.
> 
> 
> LB> Surely the TEI ought to be offering a way of encoding digital
> LB> facsimile editions as well as digital transcriptions?
> 
> No, this does not seem like a given at all. It may be something we
> want to do in the end, but it is not obvious. The "T" in "TEI" stands
> for *text*. Do we want to become the Text and Images of Thereof Encoding
> Initiative? I dunno.
> 
> 
> LB> Wouldn't it be nice to offer a way of growing one kind into the
> LB> other without doing violence to the basic TEI model?
> 
> Maybe. 
> 
> 
> LB> If we are going to have markup which describes the page images
> LB> themselves, as digital objects, then the set of page images
> LB> constituting a work isa kind of "text" itself and should be
> LB> treated as such.
> 
> With this we would end up with TEI files that look like
> 
> <TEI xmlns="http://www.tei-c.org/ns/1.0">
>   <teiHeader>
>     <!-- ... -->
>   </teiHeader>
>   <facsimile>
>     <!-- ... <pg> or <zone> or whatever goes here -->
>   </facsimile>
> </TEI>
> 
> But a <teiHeader> is explicitly, and mostly carefully, crafted to
> contain metadata about *text*. There is lots of metadata about
> *images* for which it is currently ill-suited.
> 
> 
> 
> 
> I know that this has been brought up before, but I'm afraid I need to
> be reminded ... what is it that METS doesn't do that makes us want to
> reinvent this wheel? I am (very) far from an expert in METS, but it
> is worth noting that METS is designed not only to describe the
> relationships between (e.g.) TEI elements and facsimile images, but
> to be a mechanism for storing metadata about the images.
> 
> 
> The suggestion of developing a complete mechanism for describing
> facsimiles and providing linkages, and to have it replace <text> is
> not a small invisible-to-the-user change like fixing the name of a
> class; this also doesn't seem like an obvious improvement right up
> the TEI's alley like mechanisms for encoding postscripts, manuscript
> descriptions, or physical bibliographic information. I'm worried that
> this is even more than a small addition that "pushes the envelope"
> like personagraphy does.
> 
> I'm not saying that the TEI should not do this. I am suggesting that
> we should not rush into it without more care than can be applied in
> the remaining week. I think this is a 1.1 or 1.2 issue.
> 
> 
> That said, I think having a *simple* mechanism that correlates page
> breaks (and perhaps other milestone elements) to a single scanned
> image of the page (that follows the <pb>) is pretty much a
> requirement for 1.0. Quite a few users have mentioned their desire
> for this to me, one of whom is Martin Mueller; thus I think of this
> as a solution for "the Martin Mueller's of the world".
> 
> Again, it is not ideal for us to recommend "use corresp=" (although
> even that is better than nothing), as there may well be other
> correspondences that apply to <pb>s which a user would want to
> encode.
> 
> This mechanism should NOT: 
> * permit linking to multiple images (thumbnails, high-res, Xray,
>   etc.)
> * permit linking to a portion of an image above and beyond what one
>   can do in a standard URL (i.e., w/ XPointer)
> * require indirection to get to the image (beyond perhaps resolving a
>   URN to a URL)
> * require table-lookup, decoding, or use of NOTATIONs (except of
>   course resolution of entity references as required by XML, and of %
>   escaped characters as per URLs)
> * use more than one attribute
> 
> This mechanism SHOULD:
> * be expandable -- if we decide to go with a system like Conal's, it
>   would be nice if we could make use of this simple mechanism and
>   build upon it, if it seemed like the right thing to do; if we
>   decide to add cRef= like shortcuts later, we should be able to
> 
> Personally, I think it should just be an attribute that is declared
> as a single data.pointer that points to the image. No MIME type, no
> indirection, no regular expression to resolve to get the URI. Just a
> pointer -- a simple URI. We just have to name the attribute such that
> its semantics could be expanded in the future, if we want.
> 
> 
> I hope this post makes sense. G'night.
> 
> _______________________________________________
> tei-council mailing list
> tei-council at lists.village.Virginia.EDU
> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
> 




More information about the tei-council mailing list