17.022 a digitization robot

From: Humanist Discussion Group (by way of Willard McCarty willard.mccarty@kcl.ac.uk)
Date: Fri May 16 2003 - 02:02:48 EDT

  • Next message: Humanist Discussion Group (by way of Willard McCarty

                    Humanist Discussion Group, Vol. 17, No. 22.
           Centre for Computing in the Humanities, King's College London
                         Submit to: humanist@princeton.edu

             Date: Fri, 16 May 2003 06:57:17 +0100
             From: NINCH-ANNOUNCE <david@ninch.org>
             Subject: Stanford's Digitization Robot

    News on Networking Cultural Heritage Resources
    from across the Community
    May 13, 2003

                               Stanford's Digitization Robot
              "The Evelyn Wood of Digitized Book Scanners," By John Markoff
                               New York Times, May 12, 2003
                              (requires simple registration)

    This item in Monday's New York Times, cited by Ann Okerson on the
    liblicense list, should be of some interest to this list.

    David Green

    >Date: Tue, 13 May 2003 21:46:43 -0400 (EDT)
    >From: Ann Okerson <ann.okerson@yale.edu>
    >To: liblicense-l@lists.yale.edu
    >>Reply-To: liblicense-l@lists.yale.edu
    >Monday's New York Times included an interesting article on an impressive
    >new automated scanning machine in use at Stanford University. The machine
    >literally turns the pages of books of every format while scanning to
    >graphic images or OCR text of extraordinarily high quality. The potential
    >for automating the conversion of large quantities of text is immediately
    > (requires simple registration)
    >The Evelyn Wood of Digitized Book Scanners
    >May 12, 2003
    >PALO ALTO, Calif., May 10 - Putting the world's most
    >advanced scholarly and scientific knowledge on the Internet
    >has been a long-held ambition for Michael Keller, head
    >librarian at Stanford University. But achieving this goal
    >means digitizing the texts of millions of books, journals
    >and magazines - a slow process that involves turning each
    >page, flattening it and scanning the words into a computer
    >Mr. Keller, however, has recently added a tool to his
    >crusade. On a recent afternoon, he unlocked an unmarked
    >door in the basement of the Stanford library to demonstrate
    >the newest agent in the march toward digitization. Inside
    >the room a Swiss-designed robot about the size of a sport
    >utility vehicle was rapidly turning the pages of an old
    >book and scanning the text. The machine can turn the pages
    >of both small and large books as well as bound newspaper
    >volumes and scan at speeds of more than 1,000 pages an
    >Occasionally the robot will stumble, turning more than a
    >single page. When that happens, the machine will pause
    >briefly and send out a puff of compressed air to separate
    >the sticking pages.
    >For Mr. Keller, the robot, made by 4DigitalBooks, one of
    >two companies now introducing the first automated
    >digitization systems, is a boon.
    >"Think about the power of bringing our library to little
    >schools in the middle of Africa," Mr. Keller said. "Would
    >it make a difference for those who now have their minds
    >closed to the idea of democracy?"
    >The first book-scanning robots were introduced this spring
    >by 4DigitalBooks of St. Aubin, Switzerland, and Kirtas
    >Technologies of Victor, N.Y. The machines have already
    >begun to generate interest from libraries and private and
    >nonprofit groups now working to digitize books.
    >Until now, the job has been done mostly by students or
    >armies of low-cost workers in countries like India and the
    >Philippines. But manual digitization presents significant
    >logistical problems. Book collections may have to be moved
    >long distances to digitization centers.
    >And in some cases the process of scanning has damaged old
    >books and journals, making it necessary to rebind them
    >The digitizing machines, by contrast, can be located close
    >to book collections and offer speed and quality control
    >unattainable by manual systems.
    >Even so, manual processing is still less expensive in many
    >cases than acquiring a robot. The 4DigitalBooks robot,
    >whose price neither the company nor Stanford officials
    >would disclose, becomes cost effective on projects larger
    >than 5.5 million pages, said Ivo Iossiger, the company's
    >chief technology officer and a co-founder. It seems likely
    >that the vast majority of digitization over the next
    >several years will be done by hand.
    >Mr. Keller admits that his dream to have the entire
    >Stanford library in a digital database is unlikely in the
    >foreseeable future because such an undertaking - involving
    >eight million volumes - could cost upward of $250 million.
    >In the meantime, the Stanford librarians have begun
    >digitizing books and documents where there are no thorny
    >copyright barriers and have important historical and
    >political significance.
    >The newly installed robot is currently finishing two pilot
    >projects, scanning books published by Stanford's Center for
    >the Study of Language and Information and works for the
    >Medieval and Modern Thought Text Digitization Project. It
    >will soon begin work on the 2,500 titles published by the
    >Stanford University Press.
    >Not long ago Stanford helped finance the manual
    >digitization of the presidential papers of Eduardo Frey,
    >the former president of Chile, who was concerned that
    >records of his administration could be lost in a coup.
    >And beginning in 1999, the Stanford library system sent a
    >team of specialists and students to Europe, where the
    >university is engaged in a multiyear project to digitize
    >selected documents produced by the General Agreement on
    >Tariffs and Trade and its successor organization, the World
    >Trade Organization in Geneva. The project, which will take
    >five years, will ultimately scan about 2.2 million pages of
    >Other ambitious undertakings like Carnegie Mellon
    >University's Million Book Project will also continue to
    >rely on manual digitization for several more years. Another
    >project, led by the Internet Archive in San Francisco,
    >recently shipped 80 tons of old books acquired from the
    >Kansas City Library to Hyderabad, India, where they will be
    >scanned, according to Michael Lesk, a former National
    >Science Foundation official and digital library expert who
    >works with the archive.
    >Mr. Lesk said that currently in India or the Philippines it
    >is possible to scan and digitize a book for $1 to $4. But
    >he acknowledged that there were significant costs in
    >quality control.
    >For Mr. Keller the most vexing challenges are neither labor
    >costs nor technology. Librarians, he said, must find a way
    >to address the copyright restrictions that appear to be
    >tightening as a result of new federal laws like the Digital
    >Millennium Copyright Act of 1998.
    >Stanford is struggling to comply with copyright
    >restrictions while making works that have recently lost
    >their copyright protection available digitally. Mr. Keller
    >said the library increased the circulation of its
    >collection by 50 percent when it computerized its card
    >catalog. Digitizing out-of-print books could likewise make
    >them available to a much wider audience, he said. The
    >payoff for building such a digital collection, he added, is
    >vastly improved availability of a huge store of knowledge
    >and information for teaching, learning and research.


    ----------------------------------------------------------------------- NINCH-Announce is an announcement listserv, produced by the National Initiative for a Networked Cultural Heritage (NINCH). The subjects of announcements are not the projects of NINCH, unless otherwise noted; neither does NINCH necessarily endorse the subjects of announcements. We attempt to credit all re-distributed news and announcements and appreciate reciprocal credit.

    For questions, comments or requests to un-subscribe, contact the editor: <mailto:david@ninch.org> ----------------------------------------------------------------------- See and search back issues of NINCH-ANNOUNCE at <http://www.cni.org/Hforums/ninch-announce/>. -----------------------------------------------------------------------

    This archive was generated by hypermail 2b30 : Fri May 16 2003 - 02:05:28 EDT