18.506 computing and composition

From: Humanist Discussion Group (by way of Willard McCarty willard.mccarty_at_kcl.ac.uk>
Date: Fri, 21 Jan 2005 09:08:16 +0000

               Humanist Discussion Group, Vol. 18, No. 506.
       Centre for Computing in the Humanities, King's College London
                   www.kcl.ac.uk/humanities/cch/humanist/
                        www.princeton.edu/humanist/
                     Submit to: humanist_at_princeton.edu

         Date: Fri, 21 Jan 2005 09:02:18 +0000
         From: "Donald Weinshank" <weinshan_at_cse.msu.edu>
         Subject: RE: 18.499 computing and composition

ON CLEANING UP HTML FILES
David Reed wrote:
---------------------------------------------
In a message dated 1/19/2005 3:56:55 PM Mountain Standard Time,
willard.mccarty_at_KCL.AC.UK writes:
>(3) does anybody know about a program that'll strip out the
>useless code from a M$Word-created HTML file? (as a plain ascii
>file the text in question is about 17K; in its full flower, as
>published to HTML by Word, it's 48K). (By the way, I've tried
>M$Word's "filtered" HTML and Dreamweaver's HTML cleanup.
>Neither touch the mess.)
>

I use a little program called web2text that handles just about
everything. You do have to do a little clean up on the quotes and dashes
in most cases.

David Reed
---------------------------------------------

One simple approach is to save WORD files as RTF (Rich Text Format). This
strips out most of the junk, and I then import the RTF file into FrontPage,
for example.

_________________________________________________
Dr. Don Weinshank Professor Emeritus Comp. Sci. & Eng.
1520 Sherwood Ave., East Lansing MI 48823-1885
Ph. 517.337.1545 FAX 517.337.1665
http://www.cse.msu.edu/~weinshan
Received on Fri Jan 21 2005 - 04:16:26 EST

This archive was generated by hypermail 2.2.0 : Fri Jan 21 2005 - 04:16:27 EST