Learn Image Text Recognition Using Google Cloud Vision API
Posted By : Mohd Adnan | 30-Apr-2018
According to Google Cloud Vision API Documentation -
Cloud Vision API enables developers to integrate Google CloudVision detection features within applications including face and landmark detection, image labeling, optical character recognition (OCR), and tagging of explicit content.
Prerequisites:
1. Before we begin, we need to set up a project on Google Cloud Developers console (Link: https://console.cloud.google.com/)
2. Enable the Google Cloud Vision API under 'API and Services'
3. Copy the API key under Credentials which looks like this 'AIzaSyCwpab-fbRd6*******ne60NyTkA'
4. Android Studio(3+) with
Let's try our hands on implementation:
Define Permissions in AndroidManifest.xml
Create an Activity where
Define Constants:
private static final String CLOUD_VISION_API_KEY =API_KEY;
public static final String FILE_NAME = "temp.jpg";
private static final String ANDROID_CERT_HEADER = "X-Android-Cert";
private static final String ANDROID_PACKAGE_HEADER = "X-Android-Package";
private static final int MAX_LABEL_RESULTS = 10;
private static final int MAX_DIMENSION = 1200;
private static final String TAG = MainActivity.class.getSimpleName();
private static final int GALLERY_PERMISSIONS_REQUEST = 0;
private static final int GALLERY_IMAGE_REQUEST = 1;
public static final int CAMERA_PERMISSIONS_REQUEST = 2;
public static final int CAMERA_IMAGE_REQUEST = 3;
Note: CLOUD_VISION_API_KEY is a variable where you'll have to define your API Key copied from Google Cloud developers console.
Create a function to start Device Camera
public void startCamera() {
if (PermissionUtils.requestPermission(
this,
CAMERA_PERMISSIONS_REQUEST,
Manifest.permission.READ_EXTERNAL_STORAGE,
Manifest.permission.CAMERA)) {
Intent intent = new Intent(MediaStore.ACTION_IMAGE_CAPTURE);
Uri photoUri = FileProvider.getUriForFile(this, getApplicationContext().getPackageName() + ".provider", getCameraFile());
intent.putExtra(MediaStore.EXTRA_OUTPUT, photoUri);
intent.addFlags(Intent.FLAG_GRANT_READ_URI_PERMISSION);
startActivityForResult(intent, CAMERA_IMAGE_REQUEST);
}
}
Read the image captured onActivityResult
@Override
protected void onActivityResult(int requestCode, int resultCode, Intent data) {
super.onActivityResult(requestCode, resultCode, data);
if (requestCode == CAMERA_IMAGE_REQUEST && resultCode == RESULT_OK) {
Uri photoUri = FileProvider.getUriForFile(this, getApplicationContext().getPackageName() + ".provider", getCameraFile());
uploadImage(photoUri);
}
}
Create functions to uploadImage to Google Cloud Storage for processing
public void uploadImage(Uri uri) {
if (uri != null) {
try {
// scale the image to save on bandwidth
Bitmap bitmap =
scaleBitmapDown(
MediaStore.Images.Media.getBitmap(getContentResolver(), uri),
MAX_DIMENSION);
callCloudVision(bitmap);
mMainImage.setImageBitmap(bitmap);
} catch (IOException e) {
Log.d(TAG, "Image picking failed because " + e.getMessage());
Toast.makeText(this, R.string.image_picker_error, Toast.LENGTH_LONG).show();
}
} else {
Log.d(TAG, "Image picker gave us a null image.");
Toast.makeText(this, R.string.image_picker_error, Toast.LENGTH_LONG).show();
}
}
private void callCloudVision(final Bitmap bitmap) {
// Switch text to loading
mImageDetails.setText(R.string.loading_message);
// Do the real work in an async task, because we need to use the network anyway
try {
AsyncTask textDetectionTask = new TextDetectionTask(this, prepareAnnotationRequest(bitmap));
labelDetectionTask.execute();
} catch (IOException e) {
Log.d(TAG, "failed to make API request because of other IOException " +
e.getMessage());
}
}
Create an AsyncTask that processes the image for text detection in
private Vision.Images.Annotate prepareAnnotationRequest(Bitmap bitmap) throws IOException {
HttpTransport httpTransport = AndroidHttp.newCompatibleTransport();
JsonFactory jsonFactory = GsonFactory.getDefaultInstance();
VisionRequestInitializer requestInitializer =
new VisionRequestInitializer(CLOUD_VISION_API_KEY) {
/**
* We override this so we can inject important identifying fields into the HTTP
* headers. This enables use of a restricted cloud platform API key.
*/
@Override
protected void initializeVisionRequest(VisionRequest visionRequest)
throws IOException {
super.initializeVisionRequest(visionRequest);
String packageName = getPackageName();
visionRequest.getRequestHeaders().set(ANDROID_PACKAGE_HEADER, packageName);
String sig = PackageManagerUtils.getSignature(getPackageManager(), packageName);
visionRequest.getRequestHeaders().set(ANDROID_CERT_HEADER, sig);
}
};
Vision.Builder builder = new Vision.Builder(httpTransport, jsonFactory, null);
builder.setVisionRequestInitializer(requestInitializer);
Vision vision = builder.build();
BatchAnnotateImagesRequest batchAnnotateImagesRequest =
new BatchAnnotateImagesRequest();
batchAnnotateImagesRequest.setRequests(new ArrayList() {{
AnnotateImageRequest annotateImageRequest = new AnnotateImageRequest();
// Add the image
Image base64EncodedImage = new Image();
// Convert the bitmap to a JPEG
// Just in case it's a format that Android understands but Cloud Vision
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
bitmap.compress(Bitmap.CompressFormat.JPEG, 90, byteArrayOutputStream);
byte[] imageBytes = byteArrayOutputStream.toByteArray();
// Base64 encode the JPEG
base64EncodedImage.encodeContent(imageBytes);
annotateImageRequest.setImage(base64EncodedImage);
// add the features we want
annotateImageRequest.setFeatures(new ArrayList() {{
Feature labelDetection = new Feature();
textDetection.setType("DOCUMENT_TEXT_DETECTION");
add(labelDetection);
}});
// Add the list of one thing to the request
add(annotateImageRequest);
}});
Vision.Images.Annotate annotateRequest =
vision.images().annotate(batchAnnotateImagesRequest);
// Due to a bug: requests to Vision API containing large images fail when GZipped.
annotateRequest.setDisableGZipContent(true);
Log.d(TAG, "created Cloud Vision request object, sending request");
return annotateRequest;
}
private static class TextDetectionTask extends AsyncTask {
private final WeakReference mActivityWeakReference;
private Vision.Images.Annotate mRequest;
TextDetectionTask(MainActivity activity, Vision.Images.Annotate annotate) {
mActivityWeakReference = new WeakReference<>(activity);
mRequest = annotate;
}
@Override
protected String doInBackground(Object... params) {
try {
Log.d(TAG, "created Cloud Vision request object, sending request");
BatchAnnotateImagesResponse response = mRequest.execute();
return convertResponseToString(response);
} catch (GoogleJsonResponseException e) {
Log.d(TAG, "failed to make API request because " + e.getContent());
} catch (IOException e) {
Log.d(TAG, "failed to make API request because of other IOException " +
e.getMessage());
}
return "Cloud Vision API request failed. Check logs for details.";
}
protected void onPostExecute(String result) {
MainActivity activity = mActivityWeakReference.get();
if (activity != null && !activity.isFinishing()) {
TextView imageDetail = activity.findViewById(R.id.image_details);
imageDetail.setText(result);
}
}
}
Create a function that will scale down the bitmap captured to make processing faster
private Bitmap scaleBitmapDown(Bitmap bitmap, int maxDimension) {
int originalWidth = bitmap.getWidth();
int originalHeight = bitmap.getHeight();
int resizedWidth = maxDimension;
int resizedHeight = maxDimension;
if (originalHeight > originalWidth) {
resizedHeight = maxDimension;
resizedWidth = (int) (resizedHeight * (float) originalWidth / (float) originalHeight);
} else if (originalWidth > originalHeight) {
resizedWidth = maxDimension;
resizedHeight = (int) (resizedWidth * (float) originalHeight / (float) originalWidth);
} else if (originalHeight == originalWidth) {
resizedHeight = maxDimension;
resizedWidth = maxDimension;
}
return Bitmap.createScaledBitmap(bitmap, resizedWidth, resizedHeight, false);
}
Finally, create a function to display the text extracted from
private static String convertResponseToString(BatchAnnotateImagesResponse response) {
String result = "";
TextAnnotation fullTextAnnotation = response.getResponses().get(0).getFullTextAnnotation();
if (fullTextAnnotation != null) {
result = fullTextAnnotation.get("text").toString();
} else {
result = "";
}
return result;
}
The variable result will contain the text extracted from
Hope that helps :)
About Author
Mohd Adnan
Adnan, an experienced Backend Developer, boasts a robust expertise spanning multiple technologies, prominently Java. He possesses an extensive grasp of cutting-edge technologies and boasts hands-on proficiency in Core Java, Spring Boot, Hibernate, Apache Kafka messaging queue, Redis, as well as relational databases such as MySQL and PostgreSQL. Adnan consistently delivers invaluable contributions to a variety of client projects, including Vision360 (UK) - Konfer, Bitsclan, Yogamu, Bill Barry DevOps support, enhedu.com, Noorisys, One Infinity- DevOps Setup, and more. He exhibits exceptional analytical skills alongside a creative mindset. Moreover, he possesses a fervent passion for reading books and exploring novel technologies and innovations.