Monday, January 31, 2011

Optical Character Recognition

Optical Character Recognition (commonly OCR) is a process requiring specialized software in order to interpret and convert the characters from a scanned image into standardized text. The main benefit of an OCR software process is the ability to incorporate a full-text search (e.g. by keyword or phrase). Further, SoftFile can offer a full-text search within not only the single document, but across the entirely library of documents scanned.

Although a mature technology, OCR is still prone to certain error ratios. Errors are especially prevalent related to historical documentation, that is - recordation not generated by high quality print devices (e.g. a laser printer). As such, recordation originating from typewriters and dot-matrix style printers do not "OCR" well. Analog media such as microforms (e.g. aperture cards, microfilm and microfiche) do not generally respond well to an OCR process.

Generally, the error ratio originating from a high quality print device is generally acceptable to most end-users.
Optical Character Recognition should not be confused with electronic content re-mastering of a document, which includes laborious OCR clean-up in order to restore a scanned document to Microsoft Word or other native document format.

Zonal OCR (Optical Character Recognition) is a subset of full text OCR. Where full-text OCR attempts to capture the entire text of a scanned document, zonal OCR is software instructions to capture content from designated 'zones,' where required data should appear on standardized forms.

In some instances (the original paper is from a high quality print device), Zonal OCR can be used instead of manual data entry.

Enterprise Content Management

Enterprise Content Management is the strategy or official protocol employed by an organization in order to manage its business documents. Generally this includes all recordation including; paper, electronically scanned documents, documents that were both created electronically and remain so - such as email or website pages, as well as microforms (such as microfilm and microfiche).

With respect to the management of electronic-only content, generally such a system is known as any of the following:


  • Electronic Content Management (ECM)
  • Electronic Content Management System (ECMS)
  • Electronic Document Management  (EDM)
  • Electronic Document Management System (EDMS)

Within certain industry verticals or specific document types, for example healthcare, this same system might be called an Electronic Health Record (EHR) or Electronic Health Record system.

When an organization decides to go paperless, they must either choose a commercial-off-the-shelf system (for which there are hundreds to choose from) or develop their own in-house system. There are pros and cons on both sides of this decision making process.  When documents are scanned by SoftFile, we can either include a commercial ECM system or develop a custom non-proprietary database. If the customer already has an ECM, the electronic document and data captured by SoftFile can likely be imported into the existing system.