Humanist Discussion Group, Vol. 18, No. 470.
Centre for Computing in the Humanities, King's College London
www.kcl.ac.uk/humanities/cch/humanist/
www.princeton.edu/humanist/
Submit to: humanist_at_princeton.edu
[1] From: Erik Hatcher <esh6h_at_virginia.edu> (42)
Subject: Re: 18.463 indexing local machines
[2] From: "Okyere, Emmanuel II" <chief_at_okyere.org> (96)
Subject: RE: 18.463 indexing local machines
[3] From: "Patrik Svensson" (41)
<patrik.svensson_at_engelska.umu.se>
Subject: RE: 18.463 indexing local machines
[4] From: "Stephen Woodruff" <s.woodruff_at_arts.gla.ac.uk> (4)
Subject: RE: 18.463 indexing local machines
--[1]------------------------------------------------------------------
Date: Fri, 07 Jan 2005 10:12:57 +0000
From: Erik Hatcher <esh6h_at_virginia.edu>
Subject: Re: 18.463 indexing local machines
On Jan 6, 2005, at 2:27 AM, Humanist Discussion Group (by way of
Willard McCarty <willard.mccarty_at_kcl.ac.uk>) wrote:
>It took only about a month before I deleted the highly structured
>collection in favour of the unstructured one.
Structure is so overrated!
>Seriously, in the life of an interdisciplinary computing humanist
>nearly
>every intellectual object falls under so many distinct categories,
>whatever
>the scheme, that I cannot see any such thing working. Except, perhaps,
>for
>those who devote themselves to the scheme rather than to what it
>schematizes.
Consider Folksonomies :)
http://www.adammathes.com/academic/computer-mediated-communication/
folksonomies.html
>Automatic indexing then became a priority. Eventually I gave up on
>Windows
>XP's native indexing -- the finding mechanism is too slow and clumsy. A
>visiting lecturer (may his tribe increase) drew my attention to X1
>(www.x1.com/), which I tried out, then purchased.
What was your experience with X1?
>What have others done? What's been the experience?
I come to this party with a heavy Lucene bias. Check out my newly
launched site at http://www.lucenebook.com - this is a "search inside"
the book combined with a blog. I'm actively evolving it (with TODO
items of integrating errata into book section search matches, and so
on).
There are several Lucene-based desktop search options, though
admittedly I have little experience with them first hand. But here are
some things to try:
* Searchblox - http://www.searchblox.com (they contributed a case
study to the Lucene book, so I know the most about this one)
* Aduna AutoFocus - http://aduna.biz/products/autofocus/index.html
* Zilverline - http://www.zilverline.org
Our Lucene book free source code download comes with a simple text file
indexer (it'll crawl a directory tree indexing .txt files only) which
could be adapted to index other types of content. The tricks would be
to also enhance to check date stamps and re-index new content and
remove documents that no longer exist and integrate in various document
parser for HTML, Word, PDF, etc types.
Erik
--[2]------------------------------------------------------------------
Date: Fri, 07 Jan 2005 10:16:34 +0000
From: "Okyere, Emmanuel II" <chief_at_okyere.org>
Subject: RE: 18.463 indexing local machines
Willard,
I'm yet to try X1 and I'm looking forward to when yahoo finally release it
(yahoo has licensed X1 and will release a free version early this year:
http://www.usatoday.com/tech/techinvestor/corporatenews/2004-12-10-yahoo-des
ktop-search_x.htm?csp=34) to give it a go.
I have tried the MSN toolbar suite (http://beta.toolbar.msn.com/), Google
Desktop and Copernic Desktop Search (http://www.copernic.com/); I like
copernic best and it gives me so much flexibility in terms of what I want to
index; it also plugs a search bar into the taskbar that makes it easy to
search anytime and has a really nice UI. It is also free.
There's a nice roundup of things here: http://slate.msn.com/id/2111643/
Cheers,
- eokyere
--- Emmanuel OKYERE II CTO - AKUABA, LLC Phone/Fax: 703.815.4702 PGP Key ID: 0xA7FD6168 MSN: compubandit AIM: compubndit http://www.okyere.org/ | -----Original Message----- | From: Humanist Discussion Group [mailto:humanist_at_Princeton.EDU] On Behalf | Of Humanist Discussion Group (by way of Willard McCarty | <willard.mccarty_at_kcl.ac.uk>) | Sent: Thursday, January 06, 2005 2:28 AM | To: humanist_at_Princeton.EDU | | | Humanist Discussion Group, Vol. 18, No. 463. | Centre for Computing in the Humanities, King's College London | www.kcl.ac.uk/humanities/cch/humanist/ | www.princeton.edu/humanist/ | Submit to: humanist_at_princeton.edu | | | | Date: Thu, 06 Jan 2005 07:22:05 +0000 | From: Willard McCarty <willard.mccarty_at_kcl.ac.uk> | Subject: indexing local machines | | Recently I have tried out two programs for indexing the text- and | email-files on my local machines and one for cataloguing my images. This | is, in effect, a query about such programs, with a long preamble on my | experience so far. | | Like most others here, I suppose, I've accumulated sufficient amounts of | texts and images to make finding what I need sometimes quite difficult. | During 2003-4 I started a systematic and large-scale effort to accumulate | Web-pages, PDFs and other forms of text to support my research. (The | collection now stands at ca. 1/2GB -- it's small because I actually read | the stuff.) At first I evolved a reasonably complex directory structure | for | these files, but soon I realised that I was spending significant amounts | of | time deciding in which of the sub-sub-subdirectories to put a newcomer and | looking through the many such sub-sub-subdirectories for one I had | judiciously placed somewhere not too long before. So I set up a parallel | unstructured bit-bucket in which I put an identical copy of everything, | with the idea of seeing which way my wind was blowing. I also adopted the | practice of putting as many copies of newcomers in as many places in the | highly structured collection as I thought they belonged. | | It took only about a month before I deleted the highly structured | collection in favour of the unstructured one. Perhaps, if I had been able | to replicate myself and my equipment a number of times, I might have | assigned some of these imaginary selves to a cataloguing, metadata-writing | party, but under the circumstances I could only find that notion amusing. | Seriously, in the life of an interdisciplinary computing humanist nearly | every intellectual object falls under so many distinct categories, | whatever | the scheme, that I cannot see any such thing working. Except, perhaps, for | those who devote themselves to the scheme rather than to what it | schematizes. | | Automatic indexing then became a priority. Eventually I gave up on Windows | XP's native indexing -- the finding mechanism is too slow and clumsy. A | visiting lecturer (may his tribe increase) drew my attention to X1 | (www.x1.com/), which I tried out, then purchased. A friend then told me | about Google's Desktop free Search (desktop.google.com/), which I tried, | then discarded: what works for the Web at large does not, in my | experience, | work well for one's private collection. | | Meanwhile I picked up Google's Picasa Photo Organizer | (www.google.com/downloads/), which is as good as anything I've seen. | | What have others done? What's been the experience? | | Yours, | WM | | [NB: If you do not receive a reply within 24 hours please resend] | Dr Willard McCarty | Senior Lecturer | Centre for Computing in the | Humanities | King's College London | Kay House, 7 Arundel Street | London | WC2R 3DX | U.K. | +44 (0)20 7848-2784 fax: -2980 || | willard.mccarty_at_kcl.ac.uk www.kcl.ac.uk/humanities/cch/wlm/ --[3]------------------------------------------------------------------ Date: Fri, 07 Jan 2005 10:18:08 +0000 From: "Patrik Svensson" <patrik.svensson_at_engelska.umu.se> Subject: RE: 18.463 indexing local machines Dear Willard, This is a most interesting issue! Like you I find that unstructure works relatively well. I have tried several programs for indexing and searching local data. There is so much of my life in my email program that decent search facilities are vital. I have used X1 for quite some time now and I love it. I have about 90,000 email messages stored (basically all in-going and out-going messages from 1996 onwards) and a great deal of documents and other data (including 45 versions of my Ph.D. thesis). X1 makes it very easy to find information. I especially like the narrowing-down-as-you-type design of the program and the blazing speed (there is no perceivable lag). Also X1 searches everything and is able to show many file formats directly. Other programs I have tried, including 80-20, do not search headers when you do free text searches. That proves to be a problem as people do not always include their own name in the message field of emails, and name is a primary search category. Of course these indexing programs take up some resources doing the actual indexing but on my computers, it is hardly noticeable. I think my fascination with these tools is partly because I think that they make a qualitative difference. I do things now that I could not do before and I also find myself using the search program to find email that just arrived. It is a different kind of interface to email (and other data) and in my mind, it is a significant step away from the mailbox paradigm - in multiple ways I think. It is not always that easy to make the distinction between local and non-local data. Myself I tend to go for software that allows me to distribute data. For instance, I use Biblioscape to handle references and it allows me (and others if I let them) to view, edit and search my bibliographic data from any connected computer. I often take notes in my blog which is also distributed (and searchable). For instant messaging I use Trillian which allows me to store and search im conversations (logging is not totally unproblematic here of course). X1 allows me to search network drives (lab resources for instance) as well as local drives. I use del.icio.us (http://del.icio.us/) to keep track and search for bookmarks (from any connected computer) - this one is rather interesting as it allows "unplanned" tagging and you can see how categories develop in your own material (rather than having decided on an ontology to start with). Moreover, you may explore how your own emergent tagging scheme coincides with that of other users. I also find the tagging process rewarding in itself. It helps me associate, connect and remember. Patrik Svensson HUMlab, Umeå University, Sweden http://www.humlab.umu.se/patrik --[4]------------------------------------------------------------------ Date: Fri, 07 Jan 2005 10:19:04 +0000 From: "Stephen Woodruff" <s.woodruff_at_arts.gla.ac.uk> Subject: RE: 18.463 indexing local machines I notice Yahoo have bought X1 (www.x1.com/) and apparently intend to offer its technology (or maybe a subset?) free early this year, so now is not a good time to buy. Stephen WoodruffReceived on Fri Jan 07 2005 - 05:49:05 EST
This archive was generated by hypermail 2.2.0 : Fri Jan 07 2005 - 05:49:39 EST