A few snippets to try and get back the blogging mojo…
Several eons ago (1988 to be precise) I did quite a bit of work with OCRâ€”using the then revolutionary Palantir CDP-3000 scanner and ZyIndex to build a full-text searchable database of theses and dissertations created by Mason students. It was really a proof-of-concept pilot (one of our buzz-phrases was something about “trading disk space for shelf space”) but when the grant funding ran out, we reluctantly decided to pull the plug on the project. By then it had become quite clear that the technology of the day just wasn’t up to the task.
Well, that’s not fair to the Palantir (the name taken from Tolkien’s “seeing stone”). That machine (later called the Calera CDP-3000) was an amazing and revolutionary OCR device. With five Motorola 68000 CPUs, a Nikkor lens system, built-in ethernet and internal software that introduced now-common OCR advances like “feature extraction” and “omnifont recognition,” it was easily the most powerful OCR system of its day. Unfortunately, due to the cost of the setup (I seem to recall it was around $12,000) it never really caught on with the general public. As an aside, I do recall the VAR we purchased our unit through telling me he had just closed a deal selling 50 of the units to the CIA and another batch to the State Department.
All this came back to me this week after I installed a simple little $350 device that actually does deliver on the promise of the paperless office. The Fujitsu ScanSnap is just an amazing little machine. You feed paper in the top and it spits out PDFs at the rate of roughly 15 pages per minute. The unit has a copy of Acrobat bundled along with it but I chose DEVONthink Pro Office as the “driving application.” DEVONthink Pro Office includes a license from Readiris for OCR (limited to no more than a 50 page pdf) which enables this workflow:
- Put document(s) in sheet feeder slot and press “scan” button (will scan both sides in one pass)
- Scan Manager application (from Fujitsu) builds PDF from image file.
- DEVONthink Pro Office performs OCR on PDF image file, indexes “invisible” text layer added to the PDF then adds PDF to the database.
The final result:
- a high-quality PDF that you can print out in the future if necessary (think archive)
- a searchable text layer that gives full-text access to the document
- DEVONthink’s “See Also” button that will, as you view one document, suggest others that might be relevant
Presto…who needs paper?
It’s fast and works astonishingly well. I’ve already built a searchable database of contributor license/release forms from our MARS system (Dorothea, if you’re reading this, I did save the paper copies) and hope to eliminate the remainder of my paper files within the week.
There’s not much to setting up the ScanSnap. Be sure you download the latest version of the ScanSnap manager application from the Fujitsu website. The version that shipped on CD with my ‘soon-to-be-retired by Fujitsu’ FI-5110EOXM-15PPM unit was not a universal binary. The version on the website for the newer ScanSnap S500M is universal and works fine with the FI-5110.
Within DEVONthink, be sure to uncheck the “Set Attributes” checkbox under OCR in Preferences. If you leave it checked, you’ll have to work much harder to handle multi-page documents.
Windows users, there’s a slightly different ScanSnap model for your side of the house. Ships with Adobe Acrobat as well.
Set up a Sun V245 server for Voyager backup this week. Seems Sun has really changed the default boot behavior of servers in the past couple of years. I spent the better part of a day working my way through a self-paced trial and (mostly) error session learning about alom and sc> prompts. I think I prefer the old “boot cdrom” method of installing Solaris but I guess that’s just no longer done.
Anyway, I finally figured out how to reset the NVRAM so the machine would just boot from the disk when it was turned on and now it’s up and running a “stand by” copy of our Voyager system. What I haven’t yet figured out is why the unit refuses to boot from the disk whenever I open the case. If I open the case (e.g., to add memory) I have to reattach my STTY cable and then work through sc> prompts until the thing boots. I’ve now fully populated the motherboard with memory chips so I guess I won’t be opening it for a whileâ€”but I really will have to figure out what’s going on.