[tei-council] encoding page scans

Christian Wittern wittern at kanji.zinbun.kyoto-u.ac.jp
Fri Dec 16 02:48:51 EST 2005


Lou Burnard <lou.burnard at computing-services.oxford.ac.uk> writes:

>> <snip/> The main thing, to my
>> 
>>>mind, is that the guidelines should clearly document some standard
>>>practice that doesn't involve treating page scans as figures, or making
>>>custom extensions to the TEI schema.
>>>
>> yes, yes, yes

yes.  What we do need in the Guidelines is some recommendation on how
to do this.  It is a common requirement and it would be helpful if we
show people how to do it in a way that is likely to be similar to what
her neighbour is doing.  I think we might want to have one
quick-and-dirty way of doing this, which is I think what Conal is
asking for, which might be completely done within TEI.  Digital
Libraries and the like might want to adopt a more sophisticated
strategy, where METS gets into the game, which would also be helpful
if we document a recommended way.  I see a task force here as well.

> No no no! In P5 we have a generic <graphic> and a generic <bitmap>
> element which could surely be used to mark the presence of a pagescan
> if you like. Whether you want to wrap them up in a (P5) <figure>  or
> in something else is a different question, of course. Also, at P5, you
> can't use the TEI without doing some kind of "custom extensions", so
> there's nothing to be afraid of there!

Yeah, quite right in the general case.  Nevertheless, we do want to
continue to give recommendations on best practice and the like.
Common requirements like this *should* be discussed in the Guidelines. 

>
>> <structMap LABEL="Marginal Scholia">
>>   <div LABEL="Book 4">
>>     <div ID="Am.4.1">
>>       <fptr>
>>         <area FILEID="id-2001.01.0092" COORDS="368,842,1600,1656"/>
>>       </fptr>
>>       <fptr>
>>         <area FILEID="id-sch-4" BEGIN="Am.4.1"/>
>>       </fptr>
>>     </div>
>> FILEID links up to the file groups; COORDS records the coordinates
>> on
>> the image file; BEGIN records the xml:id of the line in the TEI file
>> (multiple lines would have both BEGIN and END).
>
> This seems to confuse two different things: the first <fptr> is
> pointing to an area, and second one to a sequence of lines, isn't
> it?

Right.  And the point is to say that these two components do
correspondend to each other.


> In P5, you can use a URI to point in either case, so the syntax is
> likely to be a lot less verbose tho not particularly more perspicuous.
>
> Then you could either use the corresp attribute to say that the two
> things correspond, or a standoff <link> to associate them.
>

I still think that for these cases there is no good reason to
re-invent METS.

>
> Another thing to throw into the pot: There are formats like DjaVu
> which embed in a single object a compressed page image and an XML
> transcript of it. Anyone got any views on how that affects these
> issues?
>

Now, here we do have a good case for a customization.  My djvu.odd would
just add some @coords to every <w> element, most likely.  But you
still need to have a place to put the image location.  I recently
briefly looked at Scansofts Omnipage, which produces (horrible,
horrible) XML which contains all this information, just crying for
being domesticated in TEI.

All the best,

Christian

-- 

 Christian Wittern 
 Institute for Research in Humanities, Kyoto University
 47 Higashiogura-cho, Kitashirakawa, Sakyo-ku, Kyoto 606-8265, JAPAN



More information about the tei-council mailing list