Overview of Batch OCR
The Optical Character Recognition (OCR) module recognizes and translates printed alphanumeric characters on a scanned image document and image-only PDFs into characters in a text document. OnBase OCR supports 18 languages, making it an ideal solution for international companies or businesses with document sets in multiple languages.
The following output formats are available.
• ASCII Text (Standard)
• ASCII Text (Formatted)
• PDF (several varieties)
• Microsoft Word
• HTML 3.2
• HTML 4.0
• Rich Text Format
• Unicode Text (Standard)
• Unicode Text (Formatted)
OCR is performed after an image has been scanned, Document Import Processed, or swept into OnBase. OCR settings can be created and saved for multiple Document Types.
Batch OCR at Carleton
OCR processing at Carleton occurs on a virtual server that runs the OnBase "Thick" client as a Windows service with the following parameters:
Products Registered: Batch OCR, Production Document Imaging (Kofax or TWAIN), Workstation Client
Server hostname: onbaseocr.ads.carleton.edu
User Windows service runs as: ADS\onbase_ocr_svc
Command line switches: -ODBC="OnBase-production" -SCHED -SCANAUTOOCR -SCANAUTOQUEUE:0
Getting documents OCR'ed
From a document retrieval hit list
Select the document(s) for OCR. Right-click and select Perform Document Full-Page OCR. The documents will be sent to the Awaiting Ad-Hoc OCR queue in the Document Imaging module on the OCR processing server.
After you Right-click and select Perform Document Full-Page OCR you will see the following confirmation message. The message will vary if documents already exist in the OCR queue.
From the Import File dialog in the OnBase Client
Select the Queue Document for OCR check box. The document will be sent to the Awaiting Ad-Hoc OCR queue in the Document Imaging module.
Viewing OCR'ed Documents