|
Turning your paper documents into editable electronic
text files requires a scanner and OCR software (Optical Character
Recognition). Without such software, scanned documents are recognized
only as images (pictures) that cannot be edited. Worse, images (i.e.
gif and JPEG) are significantly larger than text files, with the
result that transfer times - uploading and downloading - on the
Internet are considerably greater - much greater. It is,
therefore, essential to convert scanned documents into editable
text files, such as Microsoft Word documents, before sending these
over the Internet.
Optical character recognition (OCR) is the process
of turning a scanned image into computer-editable text so that you
do not have to retype the text manually. When initially scanned,
a text document is nothing more than an electronic picture, or photograph,
which is comprised of many tiny dots (pixels). The characters, or
text, you see in such images cannot be edited, in that word-processing
programs cannot see the alphanumeric characters. In short, that
contract you just finished scanning for editing purposes is seen
only as one very large image, and might just as well have been a
picture of a tree. To create an editable text file, your scans must
be processed through an OCR program.
Adjusters Asia has tested the two most popular
programs on the market today: OmniPage Pro by Caere Corp., and TextBridge
Pro Millennium by ScanSoft.
An HP ScanJet 6300C USB flatbed scanner was used
to perform the tests.
The first test involved a simple text document,
with each program responding almost exactly the same, taking less
than 30 seconds to convert the scanned page into an editable MS
Word document. As well, character (text) recognition was flawless,
with both OmniPage and TextBridge returning a score of 100% accuracy.
This is a remarkable achievement when compared to the early days
of OCR technology.
Again, these tests involved a simple typed document
containing no pictures.
The second test involved a page from a magazine
containing both text and pictures in a multi-column layout.
Once again, each of the two programs took about
the same length of time to process the scan - about 60 seconds in
total - double that of a simple text document. As well, character
recognition was again executed in a flawless manner, with each utility
registering a perfect score.
The difference was in replicating the layout of
the magazine page. While not perfect, OmniPage retained the characteristics
and layout of the original document, reproducing the columns and
images on a single page in Microsoft Word. Conversely, TextBridge
transposed the scan over two pages of a MS Word document, failing
to replicate the exact appearance of the original document. Upon
closer inspection, unnecessary line breaks were inserted in the
TextBridge scan, causing the physical size of the document to lengthen.
Newsworthy: On March
13, 2000, ScanSoft acquired the assets of Caere Corporation, with
the result that this same company now owns the two most popular
OCR programs on the market.
For speed and accuracy, each of the two OCR programs
were found to be virtually identical. If replicating the layout
of multi-column articles is your thing, though, you might want to
consider OmnPage Pro. Based on our own tests, this particular utility
would appear to have a slight edge over TextBridge Pro, but certainly
not enough to justify a price tag of $499.00, versus only $79.99
for TextBridge Pro. In fact, you'd pretty much have to be experiencing
a total electrical blackout above the shoulders to fork over the
additional $400.00 plus for OmniPage Pro.
Conclusion: stick with TextBridge; you'll be much
richer for the experience.
To learn more about these products, as well as a
sister utility known as OmniForm, visit the ScanSoft
web site. |