Monday, January 31, 2011

Optical Character Recognition

Optical Character Recognition (commonly OCR) is a process requiring specialized software in order to interpret and convert the characters from a scanned image into standardized text. The main benefit of an OCR software process is the ability to incorporate a full-text search (e.g. by keyword or phrase). Further, SoftFile can offer a full-text search within not only the single document, but across the entirely library of documents scanned.

Although a mature technology, OCR is still prone to certain error ratios. Errors are especially prevalent related to historical documentation, that is - recordation not generated by high quality print devices (e.g. a laser printer). As such, recordation originating from typewriters and dot-matrix style printers do not "OCR" well. Analog media such as microforms (e.g. aperture cards, microfilm and microfiche) do not generally respond well to an OCR process.

Generally, the error ratio originating from a high quality print device is generally acceptable to most end-users.
Optical Character Recognition should not be confused with electronic content re-mastering of a document, which includes laborious OCR clean-up in order to restore a scanned document to Microsoft Word or other native document format.

Zonal OCR (Optical Character Recognition) is a subset of full text OCR. Where full-text OCR attempts to capture the entire text of a scanned document, zonal OCR is software instructions to capture content from designated 'zones,' where required data should appear on standardized forms.

In some instances (the original paper is from a high quality print device), Zonal OCR can be used instead of manual data entry.

No comments:

Post a Comment