[tei-council] Vault migration progress

David Sewell dsewell at virginia.edu
Fri Jan 22 10:42:27 EST 2010


I've restored the original edr01. I thought that Apache on www.tei-c.org
treated text/plain as UTF-8 by default but apparently not.

The control characters I removed were all artifacts of conversion from a
word processing format to ASCII. Mostly ^Z at the end of file, some ^L
page feeds, a handful of other ones. Their presence causes Apache to
treat a file as a binary and offer it for downloading rather than
sending it as plain text, which is a nuisance.

I am updating or fixing only internal links that point to other parts of
the Vault, which all use either relative paths or a hard-coded reference
to the old www-tei.uic.edu server. I think this strikes the right
balance between current usability and desire to preserve archival
information.

David

On Thu, 21 Jan 2010, Lou Burnard wrote:

> This really is quite a difficult set of issues here.
>
> First off, in the case of edr01,  I'm not certain that converting the "ugly
> ASCII" conventions was that wise, partly because you leave other aspects of
> the markup (.sk .co)  untouched, but mainly because the comment about the
> accents at the start of the text is now mystifying.
> (and the file doesnt display properly in my web browser anyways, because it
> looks like an ASCII file, not a UTF8 one)
>
> Secondly, I wonder if it helps to distinguish form and function when
> considering the links? It's (just about) possible to preserve the function of
> internal links if we know how to decode the form they take, and we're probably
> not over concerned about the exact form they took originally anyway (though I
> suppose we might be...). It's just not possible to preserve the function of an
> external link because the thing it's pointing to either isn't there at all, or
> if it is has now moved on entirely. (OK, you might say we should point into
> places on the Internet Archive or something...)
>
> So I'm not sure. I think I agree that tidying up internal links is reasonable
> and helpful. I feel quite strongly that we shouldn't mess with external links:
> it's quite a major editorial act to say that this link is actually to this
> other document.
>
> Can't comment on the "removal of control characters" without more details.
>
>
>
>  Kevin Hawkins wrote:
> >  From an archival standpoint, I think it is best not to correct any links in
> > documents in the Vault.  That would be manipulating the historical record in
> > a way that will make it even more difficult for future researchers to
> > imagine what the authors of these documents were referring to at the time.
> > Fixing internal links only is okay if it's needed for documents to let users
> > get from one document to another, though I don't think it's actually
> > required.
> >
> > I think it's acceptable to fix character encoding (as you have) in order to
> > make these documents readable by contemporary software.  We are offering an
> > archive for users, after all -- not a preservation-quality bitstream.
> >
> > Kevin
> >
> > David Sewell wrote:
> > > Council-folk,
> > >
> > > I have been working on the migration of the TEI Vault to the Virginia
> > > server at www.tei-c.org. The files are in place and basically all
> > > working. I spent some time cleaning up control characters in text files
> > > (which often cause Apache to treat them as binary), tweaking MIME content
> > > types, and as a gift to Lou converting some ugly ASCII-7 French
> > >
> > > http://projects.oucs.ox.ac.uk/teiweb/Vault/ED/edr01.txt
> > >
> > > to UTF-8
> > >
> > > http://www.tei-c.org/Vault/ED/edr01.txt
> > >
> > > I am fixing broken internal links, where possible, but I'd like to know
> > > what Council wants done with broken external ones. For example, most of
> > > the links on this page
> > >
> > > http://www.tei-c.org/Vault/ED/edr14.html
> > >
> > > are invalid. The ones to www.uic.edu can mostly be mapped to current
> > > pages on www.tei-c.org. Undoubtedly many external links are gone
> > > forever.
> > >
> > > Should I take the time to fix those, where possible? The number is not
> > > huge, ca. 60 in all. Users will not expect all the links to be working
> > > but will be pleased if they are.
> > >
> > > Please copy me on any reply as I am not subscribed to Council list now,
> > >
> > > David
> > >
> > _______________________________________________
> > tei-council mailing list
> > tei-council at lists.village.Virginia.EDU
> > http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>
>

-- 
David Sewell, Editorial and Technical Manager
ROTUNDA, The University of Virginia Press
PO Box 801079, Charlottesville, VA 22904-4318 USA
Courier: 310 Old Ivy Way, Suite 302, Charlottesville VA 22903
Email: dsewell at virginia.edu   Tel: +1 434 924 9973
Web: http://rotunda.upress.virginia.edu/


More information about the tei-council mailing list