Capture Text from Camera with iOS

Documentation Menu

This guide walks you through a simple real-time text capture scenario, in which the user points the device's camera at the text to be recognized.

How it Works

The purpose of Real-Time Recognition SDK for Android OCR development is to enable your application to capture information directly from the smartphone camera preview frames, without actually snapping a picture. Once you start capturing, the Real-Time Recognition SDK engine will automatically request new camera frames and process them, using each new frame to verify and improve the recognition result from the previous frame. This process is continued until the result reaches the required stability level. Combining several images enables Real-Time Recognition SDK to recognize text even in situation when it is hard to obtain a still photo of suitable quality for recognition.

Note that Real-Time Recognition SDK also supports recognizing text on an image that was already saved to a file, which allows it to process existing photos, scanned texts, and so on. See Recognize Text on Photos for the description of this scenario.

Implementation

note Note: Before you begin, see Build your application with the OCR library for iOS.

To implement the real-time text capture scenario, follow these steps:

  1. Implement a delegate conforming to the RTRTextCaptureServiceDelegate protocol. The delegate will handle messages from the text capture service. Here are the recommendations on what its methods should do:
  2. Create an RTREngine object using the sharedEngineWithLicenseData: method. The method requires an NSData object containing your license data. For example, you can use dataWithContentsOfFile: to create a data object, then pass this object to the sharedEngineWithLicenseData: method.
  3. Use the createTextCaptureServiceWithDelegate: method of the RTREngine object to create a background text capture service. Only one instance of the service per application is necessary: multiple threads will be started internally.
  4. Configure the text capture service:
    • If you are using a recognition language different from English, specify it using the setRecognitionLanguages: method. Multiple languages are also supported, although setting too many languages may decrease recognition performance.
    • Your application can automatically translate the recognized text. To enable translation, add a dictionary using the setTranslationDictionary: method.
      Note that when a dictionary is set, recognition results are returned in the target language, and text in the source language is no longer available.
    • It is also recommended to call the setAreaOfInterest: method to specify the rectangular area on the frame where the text is likely to be found. For example, your application may show a highlighted rectangle in the UI into which the end user will try to fit the text they are capturing. The best result is achieved when the area of interest does not touch the boundaries of the frame but has a margin of at least half the size of a typical printed character.
  5. Implement a delegate that adopts the AVCaptureVideoDataOutputSampleBufferDelegate protocol. Instantiate an AVCaptureSession object, add video input and output and set the video output delegate. When the delegate receives a video frame via the captureOutput:didOutputSampleBuffer:fromConnection: method, pass this frame on to the text capture service by calling the addSampleBuffer: method.
    We recommend using the AVCaptureSessionPreset1280x720 preset for your AVCaptureSession.
    Also note that your video output must be configured to use the kCVPixelFormatType_32BGRA video pixel format.
  6. Process the messages sent by the service to the RTRTextCaptureServiceDelegate delegate object.
    The result will be delivered via the onBufferProcessedWithTextLines:resultStatus: method. It also reports the result stability status, which indicates if the result is available and if it is likely to be improved by adding further frames (see the resultStatus parameter). Use it to determine whether your application should stop processing and display the result to the user. We do not recommend using the result until the stability level has reached at least RTRResultStabilityAvailable.
    The result consists of one or more text lines represented by objects of the RTRTextLine class. Each RTRTextLine contains information about the enclosing quadrangle of a single line of text, and the recognized text as a string.
    Work with the results on your side.
  7. When pausing or quitting the application, call the stopTasks method to stop processing and clean up image buffers. The text capture service keeps its configuration settings (language, area of interest) and necessary resources. The processing will start automatically on the new call to the addSampleBuffer: method.

See the description of classes and methods in the API Reference section.