The startling drop in scanner prices has pushed a
niche peripheral into the mainstream and added OCR to the
vocabulary.
Simply, OCR software accepts a scan of text-containing
material - that is otherwise a bitmap (pixel) graphic or
an ordered collection of dots - and converts it to text
that can be accepted by a word processor.
OCR software is readily divided into two groups: basic
featured and high-end. The former incorporates most
products bundled (often called limited or special
editions) with a scanner; many offer surprising
sophistication and could suffice for the majority of
casual users. High-end OCR software offers more
sophisticated programs that are better suited to deal
with multiple columns of text or pages that combine text,
charts and images. If these criteria represent your
needs, read on.
Two leading contenders are Xerox' Textbridge Pro 98
and Caere's OmniPage Pro, version 8. I tried their demo
versions (15 days for Textbridge and 25 scans for
OmniPage). I used each product on a variety of documents
and was hard-pressed to choose a winner.
Ideally, 100 percent text recognition is desired.
Practically, this depends on many interrelated factors
and will vary: one program tended to edge the other on
one document but yielded on the next. Some judicious
tinkering with the brightness and contrast controls often
helps. Regardless, both processed documents faster than I
could keyboard them myself. When the pages are complex (text
and images, for example) then manual or automatic zoning
is often necessary to separate the components.
Caere OmniPage Pro
This is expensive if you must purchase a full version
(about $600). Caere has reversed a long-standing policy
of offering upgrades solely to users of their products (including
entry- level and bundled products); now it offers the
upgrade (about $170) in local retail outlets to anyone
with any OCR program. Textbridge is also available
locally as a full version or an upgrade (of any OCR
program); it is usually less expensive (the upgrade is
about $100).
Both programs offer integration with recent versions
of WordPerfect and MS Word for Windows and output formats
that support an extensive choice of applications. Check
first to determine if your particular product/version is
supported. Textbridge uniquely outputs documents in
Adobe's Acrobat (PDF) format; normal, image only and
image plus text modes are supported. Both offer support
for TWAIN scanners; most recently manufactured scanners
support this standard. Textbridge supports many ISIS-standard
scanners; OmniPage Pro integrates support for Hewlett
Packard's AccuScan feature.
A chance conversation with a Caere owner revealed one
annoying feature. After purchasing the product you are
forced to register (toll free phone call or on the
company's Website) or it will cease to function after 25
sessions; copy protection is alive and well! Worse, this
process must be repeated each time it is removed and
reinstalled.
Xerox application
Xerox also offers two document management programs
that provide: direct scanning into most office suites,
OCR, image editing (using MGI PhotoSuite), document
indexing and searching (including boolean, proximity and
natural language), a colour copy function (needs a colour
scanner and printer), fax facility and a forms fill-in
module. The more robust, Pagis Pro, version 2,
incorporates Textbridge Pro 98 and costs about $130. Its
less expensive companion, Pagis ScanWorks, substitutes
Textbridge Classic but retains the other features; it
costs about $100. A few stores may still offer Pagis Pro
97. For about $100 retail it offers the full version of
Textbridge (but Textbridge 96, not 98) and the associated
document management features but no image editor.
Pagis Pro/ScanWorks documents are stored in a
proprietary format - XIF - that supports selective
storage methods (for example, different resolution and
bit depths of text or graphics) in the internal file
format to optimize file size, resolution and colour
fidelity. To implement this paradigm your TWAIN driver is
overlaid with a proprietary interface. Some TWAIN drivers
will not cooperate and generate error messages. In these
stubborn cases an option to install the scanner's
proprietary interface is provided. (Be careful: this
feature may not be available in the earlier Pagis Pro 97.)
Unfortunately, non-compatible scanners may lose some XIF-based
features.
This is an intriguing product; it sells for little
more (about $130) than the basic Textbridge Pro but it
has received very good reviews. Check the supported
scanner list at www.pagis.com
before purchasing.
Adobe Capture
Finally, any look at advanced OCR software is
incomplete without mentioning Adobe Capture, a module
within Adobe's Acrobat suite. This program offers a
unique approach to OCR. The previous programs translated
the bitmap graphics to text or their "best guess."
The latter might include wrong characters or simply
gibberish. If Acrobat Capture cannot reliably translate
the information then it simply places the original bitmap
in the document - no guessing. "Average" users
are supported by the integrated module but high volume (presumably
corporate) users can upgrade: the trade-off is more
features but additional licensing fees. Other Adobe
Acrobat modules provide additional document management
features that not only maintain the original format of
complex documents but also permit extensive reformatting,
indexing and even multimedia additions. While Pagis Pro 2
exhibits some of these features the Adobe Acrobat suite
remains unique in others; however, the latter is more
expensive (about $300).
From my limited perspective, Adobe Acrobat documents
appear pervasive on the Web. Another benchmark: today's
CD-ROM based software frequently includes the user help
manuals as Acrobat files. However, these personal
observations do not constitute a formal survey. If price
is the principal consideration then Pagis Pro is the
least expensive choice. However, cross-platform
compatibility, documents for publication on the Web, (large)
enterprise publication and extensive multimedia
integration would likely drive the decision into Adobe's
camp. Both companies are powerhouses in document
management; I expect that each product has a devoted
following.
Bottom Line:
OmniPage Pro (Proprietary, $600)
Caere Corporation
Pagis Pro Version 2 (Proprietary, $130)
Pagis ScanWorks (Proprietary, $100)
Xerox Corporation
http://www.pagis.com
Adobe Capture Module in Adobe Acrobat (Proprietary, $300)
Adobe Systems Incorporated
Originally published: November, 1998