Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 15 Next »

Overview of Batch OCR

The Optical Character Recognition (OCR) module recognizes and translates printed alphanumeric characters on a scanned image document and image-only PDFs into characters in a text document. OnBase OCR supports 18 languages, making it an ideal solution for international companies or businesses with document sets in multiple languages.

The following output formats are available.
• ASCII Text (Standard)
• ASCII Text (Formatted)
• PDF (several varieties)
• Microsoft Word
• HTML 3.2
• HTML 4.0
• Rich Text Format
• Unicode Text (Standard)
• Unicode Text (Formatted)


OCR is performed after an image has been scanned, Document Import Processed, or swept into OnBase. OCR settings can be created and saved for multiple Document Types.

Batch OCR at Carleton

OCR processing at Carleton occurs on a virtual server that runs the OnBase "Thick" client as a Windows service. This server requires no interaction from users of OnBase, as it's merely a way for images to be processed to include a text rendition of the scanned image. The OnBase Batch OCR server runs with the following parameters:

Products Registered: Batch OCR, Production Document Imaging (Kofax or TWAIN), Workstation Client

Server hostname: onbaseocr.ads.carleton.edu

User Windows service runs as: ADS\onbase_ocr_svc

Command line switches: -ODBC="OnBase-production" -SCHED -SCANAUTOOCR -SCANAUTOQUEUE:0

Getting documents OCR'ed

Select the document(s) for OCR. Right-click and select Perform Document Full-Page OCR. The documents will be sent to the Awaiting Ad-Hoc OCR queue in the Document Imaging module on the OCR processing server (click image to enlarge).

After you Right-click and select Perform Document Full-Page OCR you will see the following confirmation message. The message will vary if documents already exist in the OCR queue (click image to enlarge).

Viewing OCR'ed Documents

To view the OCRed text, renditions, from a hit list, highlight the document, right-click
and select the Revisions/Renditions option (click image to enlarge).

 

The Document Search Results window shows an entry for each rendition.

  1. Double-click a specific rendition to view the document.
  2. To view the renditions from an open document, select the Revisions/Renditions option from the Document menu.
  3. A dialog box displays the renditions. Highlight the rendition and click OK to view the document.

Searching text within OCR'ed Documents

If you have sufficient privileges, you can search for specific text in a text-based
document assigned to a Document Type or Document Type Group.
Note: User rights are needed to access the Document Retrieval layout and to view
documents. Contact your system administrator for additional information.

  1. In the Home tab, click Retrieval.
  2. The Document Retrieval layout is opened. Ensure that the Document Retrieval pane is expanded:
  3. Select the Document Type Group(s) and/or Document Type(s) to search.
  4. Click the Text Search hyperlink.
  5. The Text Search window is displayed:
  6. In the Find What field, type the text string you want to search for. The string must
    contain at least two characters, and at least one character in the string must be a
    letter or a number. Use the drop-down list to select from previous text searches.

  7. To add additional search parameters, expand the Options pane:
  8. Select a Type Radio Button:
      1. Text - Searches for alphanumeric text.
      2. Number - Searches for numeric values and allows the use of the following
        operators to limit the search: =, >, <, >=, and =<. You can use and, or, and to
        as operators to search for a range of values. For example, type 2009 and 2010
        to find documents containing both 2009 and 2010.
        If you are searching for an exact number that is part of an alphanumeric text
        string, then the number will not be found. For example, if you search for 001
        and the actual text is ABC001, then the value will not be found.

      3. Formatted Number - Searches for numeric values that use formatting characters.
        For example, to search for all Social Security Numbers greater than 800-00-
        0000, type > 800-00-0000 in the Search String field. You can use this option with
        following operators to limit your search: =, >, <, >=, and =<. The and, or, and
        to operators can be used to search for a range of values. For example, type
        800-00-000 to 900-00-0000 to find documents containing values within this
        range.

    Note: When you search for formatted numbers greater or less than the entered search string, formatted numbers followed by periods are not included in the search results. For example, if the formatted number is the last word in a sentence, then it will be omitted as a result.

  9. Select one of the following check boxes if necessary:
      1. Select Find First to search for the first instance of the text.
      2. Select Use Wildcards to include wild card characters in your text string search criteria.

      3. Select Case Sensitive to return only matches that have the same capitalization as the text string search criteria.

      4. Select Whole Word Match to return matches for an exact word.

      5. Select Column Search to search for a text string within specified columns. In the From field, type the character position of the column to start the search in (the left most column to be searched). The column of characters at the far left of the document is 1, the next column to the right is 2, and so on. In the To field, type the character position of the column to end the search in (the right most column to be searched). The number in the To field must be greater than or equal to the number in the From field.

 

 

  • No labels