OnBase - Batch OCR

Overview of Batch OCR

The Optical Character Recognition (OCR) module recognizes and translates printed alphanumeric characters on a scanned image document and image-only PDFs into characters in a text document. OnBase OCR supports 18 languages, making it an ideal solution for international companies or businesses with document sets in multiple languages.

The following output formats are available.
• ASCII Text (Standard)
• ASCII Text (Formatted)
• PDF (several varieties)
• Microsoft Word
• HTML 3.2
• HTML 4.0
• Rich Text Format
• Unicode Text (Standard)
• Unicode Text (Formatted)


OCR is performed after an image has been scanned, Document Import Processed, or swept into OnBase. OCR settings can be created and saved for multiple Document Types.

Batch OCR at Carleton

OCR processing at Carleton occurs on a virtual server that runs the OnBase "Thick" client as a Windows service. This server requires no interaction from users of OnBase, as it's merely a way for images to be processed to include a text rendition of the scanned image. The OnBase Batch OCR server runs with the following parameters:

Products Registered: Batch OCR, Production Document Imaging (Kofax or TWAIN), Workstation Client

Server hostname: onbaseocr.ads.carleton.edu

User Windows service runs as: ADS\onbase_ocr_svc

Command line switches: -ODBC="OnBase-production" -SCHED -SCANAUTOOCR -SCANAUTOQUEUE:0

Getting documents OCR'ed

Select the document(s) for OCR. Right-click and select Perform Document Full-Page OCR. The documents will be sent to the Awaiting Ad-Hoc OCR queue in the Document Imaging module on the OCR processing server (click image to enlarge).

After you Right-click and select Perform Document Full-Page OCR you will see the following confirmation message. The message will vary if documents already exist in the OCR queue (click image to enlarge).

Viewing OCR'ed Documents

To view the OCRed text, renditions, from a hit list, highlight the document, right-click and select the Revisions/Renditions option (click image to enlarge).

 

The Document Search Results window shows an entry for each rendition.

  1. Double-click a specific rendition to view the document.
  2. To view the renditions from an open document, select the Revisions/Renditions option from the Document menu.
  3. A dialog box displays the renditions. Double-click a specific rendition to view the document.

Searching text within OCR'ed Documents

If you have sufficient privileges, you can search for specific text in a text-based document assigned to a Document Type or Document Type Group.

Note: User rights are needed to access the Document Retrieval layout and to view documents.

  1. In the Home tab, click Retrieval.
  2. The Document Retrieval layout is opened. Ensure that the Document Retrieval pane is expanded:
  3. Select the Document Type Group(s) and/or Document Type(s) to search.
  4. Click the Text Search hyperlink.
  5. The Text Search window is displayed:
  6. In the Find What field, type the text string you want to search for. The string must
    contain at least two characters, and at least one character in the string must be a
    letter or a number. Use the drop-down list to select from previous text searches.

  7. To add additional search parameters, expand the Options pane:
  8. Select a Type Radio Button:
      1. Text - Searches for alphanumeric text.
      2. Number - Searches for numeric values and allows the use of the following
        operators to limit the search: =, >, <, >=, and =<. You can use and, or, and to
        as operators to search for a range of values. For example, type 2009 and 2010
        to find documents containing both 2009 and 2010.
        If you are searching for an exact number that is part of an alphanumeric text
        string, then the number will not be found. For example, if you search for 001
        and the actual text is ABC001, then the value will not be found.

      3. Formatted Number - Searches for numeric values that use formatting characters.
        For example, to search for all Social Security Numbers greater than 800-00-
        0000, type > 800-00-0000 in the Search String field. You can use this option with
        following operators to limit your search: =, >, <, >=, and =<. The and, or, and
        to operators can be used to search for a range of values. For example, type
        800-00-000 to 900-00-0000 to find documents containing values within this
        range.

    Note: When you search for formatted numbers greater or less than the entered search string, formatted numbers followed by periods are not included in the search results. For example, if the formatted number is the last word in a sentence, then it will be omitted as a result.

  9. Select one of the following check boxes if necessary:
      1. Select Find First to search for the first instance of the text.
      2. Select Use Wildcards to include wild card characters in your text string search criteria.

      3. Select Case Sensitive to return only matches that have the same capitalization as the text string search criteria.

      4. Select Whole Word Match to return matches for an exact word.

      5. Select Column Search to search for a text string within specified columns. In the From field, type the character position of the column to start the search in (the left most column to be searched). The column of characters at the far left of the document is 1, the next column to the right is 2, and so on. In the To field, type the character position of the column to end the search in (the right most column to be searched). The number in the To field must be greater than or equal to the number in the From field.

    If necessary, you can clear search parameters by clicking Clear Text Constraints:   

  10. Click outside the Text Search window, or click Close:
  11. The Text Search window closes and the Text Search field displays in the Document Retrieval pane:

  12. The Text Search field contains the Type of search selected in the Text Search window, followed by the entry in the Find What field. You can edit the text search by clicking the Text Search hyperlink. You can remove the text constraints by clicking the following button:

  13. Press the Enter key or click Find:
  14. When the search is finished, OnBase displays all matching documents in a Document Search Results list.

  15. Open a document. The document is displayed in the Document Viewer. The page of the document containing the text string you searched for is displayed.