[tei-council] Von Braun toc with some content
Peter Boot
pboot at xs4all.nl
Wed Aug 12 17:33:02 EDT 2009
This is what I imagine the body of the response might look like. Your
thoughts are welcome.
Peter
1. Introduction
TEI is a standard that has been successfully and widely used in the
digital transcription of texts from many periods. It has been used both
for mass digitisation in digital libraries and for digitisation of
literary manuscripts.
It is successful for a number of reasons:
* it contains modules for both very regularly occurring textual
features, such as lists or tables, and for specialised features such as
linguistic analysis. Recently a module for technical documentation was
developed for ISO. Where new features are necessary, the system can be
easily extended;
* it focuses on the creation of an application-independent digital
representation of the source document. Because the representation does
not depend on the capabilities of specific software, the representation
will outlast the capabilities of today's software and can be used for
many different purposes
* [more bragging]
The basic idea is that each document is encoded as an XML file. (There
is a glossary in the back of this document that explains technical
terms). This XML file contains the full text (typed and hand-written)
and describes the structure of this text: it defines the hierarchical
structure that groups the individual notes, specifies features like
underlining, and identifies e.g. the person who wrote a particular piece
of text. The XML file also contains pointers to the files that contain
the page images. A document header contains (among else) the
meta-information that is necessary for cataloguing: author, date,
information about attachments, etc.
A phase of document preparation will thus result in a collection of XML
files. We describe the workflow of this process in section 2 of this
response. A number of possible components of these files is presented in
section 3. How a working system can be created on the basis of the XML
files is discussed in section 4. Based on these discussions, in section
5 we address the specific questions formulated in the Request for
Information. Section 6 contains pointers to a number of web sites that
present different sorts of documents based on the technologies we
advocate here. Section 7 finally is a glossary that explains technical
terminology.
[Do we need to say why we are interested in this? If so, I'd say our
main interest is seeing that they use the proper technology and we want
that, apart from the fact that we believe it's best for everyone,
because of the publicitary value this would have for us]
2. Workflow for document preparation
Might consist of the following phases
(1)high quality digital photography (the samples on the web show some
scans where part of the page is missing)
(2)creation of an inventory of all pages: what pages are there, what are
their dates and authors, to what sets of notes do they belong, are they
notes proper or attachment to notes, are they possibly duplicates of
other pages, etc.
[To me this seems to call for a simple database; from that database, the
outline of the TEI documents (basic headers, facsimile section, pb
elements) can then be generated]
(3)Creation of guidelines for the desired encoding
This will involve a consideration of the desirables that emerge from
study of the material, technical possibilities, available funding and
time
(4)Transcribing typed content, presumably by sending this overseas
(5)Transcribing hand-written notes
(6)extending the encoding with more complex phenomena, such as internal
references, indexing, identifying persons and projects
3. TEI components
Will explain that a TEI schema can be created that contains just those
components that NASA has decided they will want to use. Explain some of
the available components, but only very briefly. Relate this to what the
different encodings mean in terms of enhanced access. Explain ODD in
qualitative terms.
4. Possible technical architecture of a working system
[This section would discuss what to do once the XML has been created.
I'd stress there are multiple options, eg. Cocoon + stylesheets + Lucene
(or eXist). Mention some of the options from Lou's presentation at
http://tei.oucs.ox.ac.uk/Oxford/2007-02-13-oucs/talk-publishing.xml]
5. Approach to concepts
This would answer NASA's specific questions, in so far as we have answers
[1. How should NASA catalogue the Weekly Notes? Do you have specific
ideas on how to implement the approach or strategy?
2. What format(s) should the Weekly Notes be available in?
3. How should the Weekly Notes be indexed?
4. What timeframe do you expect this work to require?
5. What other strategies or approaches do you recommend that NASA pursue
that would contribute to successful cooperation between NASA and other
entities to create a successful and useful product from the Weekly
Notes? Could these notes form the basis for understanding management
best practices? Could engineering design and operational considerations
be derived from these notes? Could these notes form the basis for formal
classroom training? ]
6. Links
Links to sample projects to the Guidelines and to some introductory
material.
7. Glossary
More information about the tei-council
mailing list