OnBase - Batch OCR

Overview of Batch OCR

The Optical Character Recognition (OCR) module recognizes and translates printed alphanumeric characters on a scanned image document and image-only PDFs into characters in a text document. OnBase OCR supports 18 languages, making it an ideal solution for international companies or businesses with document sets in multiple languages.

The following output formats are available.
• ASCII Text (Standard)
• ASCII Text (Formatted)
• PDF (several varieties)
• Microsoft Word
• HTML 3.2
• HTML 4.0
• Rich Text Format
• Unicode Text (Standard)
• Unicode Text (Formatted)

OCR is performed after an image has been scanned, Document Import Processed, or swept into OnBase. OCR settings can be created and saved for multiple Document Types.

Batch OCR at Carleton

OCR processing at Carleton occurs on a virtual server that runs the OnBase "Thick" client as a Windows service. This server requires no interaction from users of OnBase, as it's merely a way for images to be processed to include a text rendition of the scanned image. The OnBase Batch OCR server runs with the following parameters:

Products Registered: Batch OCR, Production Document Imaging (Kofax or TWAIN), Workstation Client

Server hostname: onbaseocr.ads.carleton.edu

User Windows service runs as: ADS\onbase_ocr_svc

Command line switches: -ODBC="OnBase-production" -SCHED -SCANAUTOOCR -SCANAUTOQUEUE:0

Getting documents OCR'ed

Select the document(s) for OCR. Right-click and select Perform Document Full-Page OCR. The documents will be sent to the Awaiting Ad-Hoc OCR queue in the Document Imaging module on the OCR processing server (click image to enlarge).

After you Right-click and select Perform Document Full-Page OCR you will see the following confirmation message. The message will vary if documents already exist in the OCR queue (click image to enlarge).

Viewing OCR'ed Documents

To view the OCRed text, renditions, from a hit list, highlight the document, right-click
and select the Revisions/Renditions option (click image to enlarge).

The Document Search Results window shows an entry for each rendition.

Double-click a specific rendition to view the document.
To view the renditions from an open document, select the Revisions/Renditions option from the Document menu.
A dialog box displays the renditions. Highlight the rendition and click OK to view the document.

Searching text within OCR'ed Documents

If you have sufficient privileges, you can search for specific text in a text-based
document assigned to a Document Type or Document Type Group.
Note: User rights are needed to access the Document Retrieval layout and to view
documents. Contact your system administrator for additional information.

In the Home tab, click Retrieval.
The Document Retrieval layout is opened. Ensure that the Document Retrieval pane is expanded:
Select the Document Type Group(s) and/or Document Type(s) to search.
Click the Text Search hyperlink.
The Text Search window is displayed: