[tei-council] Tite and conformance

Daniel Paul O'Donnell daniel.odonnell at gmail.com
Fri Jul 10 11:03:00 EDT 2009


I think, James, that if the digitization benefit goes through, we are 
going to have to return to Tite seriously on Council. Tite was very much 
a bottom up initiative and I think your comments are really touching on 
some important issues: at Council, I think we tend to do a better job 
thinking out the general implications that individuals often do.

If the benefit goes forward and becomes a popular thing--and at the 
prices we are looking at from the vendors, our original survey suggests 
it will--then we will have an obligation to make sure that the 
underlying language used for that benefit is the best possible for the 
task. This might mean adding features like you mention here below. It 
might also mean really investigating the question of the keystrokes 
(whether it really is significant and, if it is, whether they can be 
reduced more rationally). It might mean fundamentally restructuring things.

I think the way to understand Tite is that it was a community-developed 
extension based on existing customisations used by libraries engaged in 
keyboarding which the TEI then adopted in order to develop this 
membership benefit. Now that we are using it as an organisation, we 
ought to be looking for ways that we can improve it. And I think there 
is considerable room for improvement.

I think we will see more of this kind of process happening, frankly, as 
the SIGs seem to be becoming increasingly active and interested in 
producing things. So we will need to learn how to deal with imperfect 
material that comes to us with community momentum and learn how to guide 
and polish the development of that material. Given that Tite also has 
raised issues of interest for the i18n capabilities we built into P5 and 
is showing (through its flaws) the advantage of new elements and ways of 
doing things like facsimile, it is even useful for our regular 
deliberations on P5.

So I think we are all really starting to hit things right by thinking of 
how Tite could be improved. It came to us in a form we would probably 
not have designed ourselves. But we can certainly make it better and 
potentially do our membership a service.

James Cummings wrote:
> David Sewell wrote:
>   
>> Just one comment on the TEI Header issue. Given a strong enough 
>> statement to the effect that a TEI Tite document is intended for initial 
>> keyboard capture and is in a non-archival, nonconformant TEI format 
>> where post-processing is expected, I don't think it is horrible that 
>> <text> is permitted as a root element.
>>     
>
> I agree that Tite users would probably just abuse a header if they had 
> one.  I suppose it isn't *just* the removal of the header that bothers 
> me.  When Tite was first conceived a valid TEI document consisted of a 
> teiHeader and a text element. That is no longer the case.  TEI documents 
> can now consist of a teiHeader, a text _or_ a facsimile (or fsdDecl, but 
> let's not go there...)
>
> Might point is that large digitisation are often working from processed 
> images where they've started with OCR and have hordes of students 
> marking up (with a graphical interface) and proofreading and correcting 
> texts.  Alternatively sometimes when they are double-keying they are 
> doing it against the images and there is at least a page/image 
> relationship.
>
> But the current setup of Tite does not allow them to have a <facsimile> 
> element to preserve this information.  Output from OCR programs or 
> formats like Omnipage, Finereader, DejaVu in their XML forms actually 
> have every word marked up with corresponding co-ordinates.  (I've 
> written some XSLT to change DejaVu to TEI facsimile for example.)
>
> But exactly the people who are in a position potentially to preserve 
> this information are unable to under the format that we're suggesting to 
> them.  If I were suggesting a reDesign of Tite, it would include a 
> wrapper element (TEI? Something else?) around the <text> so as to also 
> allow a parallel <facsimile>.  But I'm not suggesting a reDesign of 
> Tite.  I just thought I'd point out that having <text> has a root has 
> other implications.
>
> -James
>   

-- 
Daniel Paul O'Donnell
Associate Professor of English
University of Lethbridge

Chair and CEO, Text Encoding Initiative (http://www.tei-c.org/)
Co-Chair, Digital Initiatives Advisory Board, Medieval Academy of America
President-elect (English), Society for Digital Humanities/Société pour l'étude des médias interactifs (http://sdh-semi.org/)
Founding Director (2003-2009), Digital Medievalist Project (http://www.digitalmedievalist.org/)

Vox: +1 403 329-2377
Fax: +1 403 382-7191 (non-confidental)
Home Page: http://people.uleth.ca/~daniel.odonnell/




More information about the tei-council mailing list