Capture Text from Camera with Android

Documentation Menu

This guide walks you through a simple real-time text capture scenario, in which the user points the device's camera at the text to be recognized.

How it Works

The purpose of Real-Time Recognition SDK for Android OCR development is to enable your application to capture information directly from the smartphone camera preview frames, without actually snapping a picture. Once you start capturing, the Real-Time Recognition SDK engine will automatically request new camera frames and process them, using each new frame to verify and improve the recognition result from the previous frame. This process is continued until the result reaches the required stability level. Combining several images enables Real-Time Recognition SDK to recognize text even in situation when it is hard to obtain a still photo of suitable quality for recognition.

Note that Real-Time Recognition SDK also supports recognizing text on an image that was already saved to a file, which allows it to process existing photos, scanned texts, and so on. See Recognize Text on Photos for the description of this scenario.


note Note: Before you begin, see Build your application with the OCR library for Android.

To implement the real-time text capture scenario during Android OCR development, follow these steps:

  1. Begin with the Callback interface implementation. Its methods will be used to pass the data to and from the recognition service. Here are the brief recommendations on what the methods should do:
  2. Call the Engine.load method on the UI thread to create an engine object via which all other objects may be created. This object should be reused for every new operation and should not be created again in the same activity.
  3. Use the createTextCaptureService method of the Engine object to create a background recognition service (implementing the ITextCaptureService interface) on the UI thread. Only one instance of the service per application is necessary: multiple threads will be started internally.
  4. Set up the processing parameters, according to the kind of text you expect to capture.
    The default text language is English; if you need other languages, specify them using the setRecognitionLanguage method.
  5. When the camera is ready, call the start method of the ITextCaptureService interface. Its required input parameters are the size and orientation of the video frame and the rectangular area where to search for the text (e.g. if your application displays a highlighted rectangle in the center of the image, this rectangle should be specified as the "area of interest").
    The service will then start up several working threads and continue interacting with your application via the Callback interface.
  6. Whenever the Callback.onRequestLatestFrame method is called, provide the current video frame from the camera by calling ITextCaptureService.submitRequestedFrame.
  7. The Callback.onFrameProcessed method will be called on the UI thread to return the result when the frame is recognized.
    It also reports the result stability status, which indicates if the result is available and if it is likely to be improved by adding further frames (see the resultStatus parameter). Use it to determine whether the application should stop processing and display the result to the user. We do not recommend using the result until the stability level has reached at least Available.
    The result consists of one or more text lines represented by objects of the TextLine class. Each TextLine contains information about the enclosing quadrangle for a single line of text and the recognized text as a string.
    Work with the results on your side.
  8. When pausing or quitting the application, call the ITextCaptureService.stop method to terminate the processing threads.

See the description of classes and methods for Android OCR development in the API Reference section.