Learn Image Text Recognition Using Google Cloud Vision API

Posted By : Mohd Adnan | 30-Apr-2018

android

Google CloudVision API

According to Google Cloud Vision API Documentation -

Cloud Vision API enables developers to integrate Google CloudVision detection features within applications including face and landmark detection, image labeling, optical character recognition (OCR), and tagging of explicit content.

Prerequisites:
1. Before we begin, we need to set up a project on Google Cloud Developers console (Link: https://console.cloud.google.com/)
2. Enable the Google Cloud Vision API under 'API and Services'
3. Copy the API key under Credentials which looks like this 'AIzaSyCwpab-fbRd6*******ne60NyTkA'
4. Android Studio(3+) with latest SDK

Let's try our hands on implementation:

Define Permissions in AndroidManifest.xml

Create an Activity where request to Text recognition using Google CloudVision API will be processed

Define Constants:

    private static final String CLOUD_VISION_API_KEY =API_KEY;
    public static final String FILE_NAME = "temp.jpg";
    private static final String ANDROID_CERT_HEADER = "X-Android-Cert";
    private static final String ANDROID_PACKAGE_HEADER = "X-Android-Package";
    private static final int MAX_LABEL_RESULTS = 10;
    private static final int MAX_DIMENSION = 1200;

    private static final String TAG = MainActivity.class.getSimpleName();
    private static final int GALLERY_PERMISSIONS_REQUEST = 0;
    private static final int GALLERY_IMAGE_REQUEST = 1;
    public static final int CAMERA_PERMISSIONS_REQUEST = 2;
    public static final int CAMERA_IMAGE_REQUEST = 3;

Note: CLOUD_VISION_API_KEY is a variable where you'll have to define your API Key copied from Google Cloud developers console.

Create a function to start Device Camera

  public void startCamera() {
        if (PermissionUtils.requestPermission(
                this,
                CAMERA_PERMISSIONS_REQUEST,
                Manifest.permission.READ_EXTERNAL_STORAGE,
                Manifest.permission.CAMERA)) {
            Intent intent = new Intent(MediaStore.ACTION_IMAGE_CAPTURE);
            Uri photoUri = FileProvider.getUriForFile(this, getApplicationContext().getPackageName() + ".provider", getCameraFile());
            intent.putExtra(MediaStore.EXTRA_OUTPUT, photoUri);
            intent.addFlags(Intent.FLAG_GRANT_READ_URI_PERMISSION);
            startActivityForResult(intent, CAMERA_IMAGE_REQUEST);
        }
    }

Read the image captured onActivityResult

 @Override
    protected void onActivityResult(int requestCode, int resultCode, Intent data) {
        super.onActivityResult(requestCode, resultCode, data);

       if (requestCode == CAMERA_IMAGE_REQUEST && resultCode == RESULT_OK) {
            Uri photoUri = FileProvider.getUriForFile(this, getApplicationContext().getPackageName() + ".provider", getCameraFile());
            uploadImage(photoUri);
        }
    }

Create functions to uploadImage to Google Cloud Storage for processing

 public void uploadImage(Uri uri) {
        if (uri != null) {
            try {
                // scale the image to save on bandwidth
                Bitmap bitmap =
                        scaleBitmapDown(
                                MediaStore.Images.Media.getBitmap(getContentResolver(), uri),
                                MAX_DIMENSION);

                callCloudVision(bitmap);
                mMainImage.setImageBitmap(bitmap);

            } catch (IOException e) {
                Log.d(TAG, "Image picking failed because " + e.getMessage());
                Toast.makeText(this, R.string.image_picker_error, Toast.LENGTH_LONG).show();
            }
        } else {
            Log.d(TAG, "Image picker gave us a null image.");
            Toast.makeText(this, R.string.image_picker_error, Toast.LENGTH_LONG).show();
        }
    }


 private void callCloudVision(final Bitmap bitmap) {
        // Switch text to loading
        mImageDetails.setText(R.string.loading_message);

        // Do the real work in an async task, because we need to use the network anyway
        try {
            AsyncTask textDetectionTask = new TextDetectionTask(this, prepareAnnotationRequest(bitmap));
            labelDetectionTask.execute();
        } catch (IOException e) {
            Log.d(TAG, "failed to make API request because of other IOException " +
                    e.getMessage());
        }
    }

Create an AsyncTask that processes the image for text detection in background thread

 private Vision.Images.Annotate prepareAnnotationRequest(Bitmap bitmap) throws IOException {
        HttpTransport httpTransport = AndroidHttp.newCompatibleTransport();
        JsonFactory jsonFactory = GsonFactory.getDefaultInstance();

        VisionRequestInitializer requestInitializer =
                new VisionRequestInitializer(CLOUD_VISION_API_KEY) {
                    /**
                     * We override this so we can inject important identifying fields into the HTTP
                     * headers. This enables use of a restricted cloud platform API key.
                     */
                    @Override
                    protected void initializeVisionRequest(VisionRequest visionRequest)
                            throws IOException {
                        super.initializeVisionRequest(visionRequest);

                        String packageName = getPackageName();
                        visionRequest.getRequestHeaders().set(ANDROID_PACKAGE_HEADER, packageName);

                        String sig = PackageManagerUtils.getSignature(getPackageManager(), packageName);

                        visionRequest.getRequestHeaders().set(ANDROID_CERT_HEADER, sig);
                    }
                };

        Vision.Builder builder = new Vision.Builder(httpTransport, jsonFactory, null);
        builder.setVisionRequestInitializer(requestInitializer);

        Vision vision = builder.build();

        BatchAnnotateImagesRequest batchAnnotateImagesRequest =
                new BatchAnnotateImagesRequest();
        batchAnnotateImagesRequest.setRequests(new ArrayList() {{
            AnnotateImageRequest annotateImageRequest = new AnnotateImageRequest();

            // Add the image
            Image base64EncodedImage = new Image();
            // Convert the bitmap to a JPEG
            // Just in case it's a format that Android understands but Cloud Vision
            ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
            bitmap.compress(Bitmap.CompressFormat.JPEG, 90, byteArrayOutputStream);
            byte[] imageBytes = byteArrayOutputStream.toByteArray();

            // Base64 encode the JPEG
            base64EncodedImage.encodeContent(imageBytes);
            annotateImageRequest.setImage(base64EncodedImage);

            // add the features we want
            annotateImageRequest.setFeatures(new ArrayList() {{
                Feature labelDetection = new Feature();
                textDetection.setType("DOCUMENT_TEXT_DETECTION");
                add(labelDetection);
            }});

            // Add the list of one thing to the request
            add(annotateImageRequest);
        }});

        Vision.Images.Annotate annotateRequest =
                vision.images().annotate(batchAnnotateImagesRequest);
        // Due to a bug: requests to Vision API containing large images fail when GZipped.
        annotateRequest.setDisableGZipContent(true);
        Log.d(TAG, "created Cloud Vision request object, sending request");

        return annotateRequest;
    }

    private static class TextDetectionTask extends AsyncTask {
        private final WeakReference mActivityWeakReference;
        private Vision.Images.Annotate mRequest;

        TextDetectionTask(MainActivity activity, Vision.Images.Annotate annotate) {
            mActivityWeakReference = new WeakReference<>(activity);
            mRequest = annotate;
        }

        @Override
        protected String doInBackground(Object... params) {
            try {
                Log.d(TAG, "created Cloud Vision request object, sending request");
                BatchAnnotateImagesResponse response = mRequest.execute();
                return convertResponseToString(response);

            } catch (GoogleJsonResponseException e) {
                Log.d(TAG, "failed to make API request because " + e.getContent());
            } catch (IOException e) {
                Log.d(TAG, "failed to make API request because of other IOException " +
                        e.getMessage());
            }
            return "Cloud Vision API request failed. Check logs for details.";
        }

        protected void onPostExecute(String result) {
            MainActivity activity = mActivityWeakReference.get();
            if (activity != null && !activity.isFinishing()) {
                TextView imageDetail = activity.findViewById(R.id.image_details);
                imageDetail.setText(result);
            }
        }
    }

Create a function that will scale down the bitmap captured to make processing faster

private Bitmap scaleBitmapDown(Bitmap bitmap, int maxDimension) {

        int originalWidth = bitmap.getWidth();
        int originalHeight = bitmap.getHeight();
        int resizedWidth = maxDimension;
        int resizedHeight = maxDimension;

        if (originalHeight > originalWidth) {
            resizedHeight = maxDimension;
            resizedWidth = (int) (resizedHeight * (float) originalWidth / (float) originalHeight);
        } else if (originalWidth > originalHeight) {
            resizedWidth = maxDimension;
            resizedHeight = (int) (resizedWidth * (float) originalHeight / (float) originalWidth);
        } else if (originalHeight == originalWidth) {
            resizedHeight = maxDimension;
            resizedWidth = maxDimension;
        }
        return Bitmap.createScaledBitmap(bitmap, resizedWidth, resizedHeight, false);
    }

Finally, create a function to display the text extracted from image

   private static String convertResponseToString(BatchAnnotateImagesResponse response) {
        String result = "";

        TextAnnotation fullTextAnnotation = response.getResponses().get(0).getFullTextAnnotation();
        if (fullTextAnnotation != null) {
            result = fullTextAnnotation.get("text").toString();
        } else {
            result = "";
        }
        return result;
    }

The variable result will contain the text extracted from image.

Hope that helps :)

Related Tags

About Author

Mohd Adnan

Adnan, an experienced Backend Developer, boasts a robust expertise spanning multiple technologies, prominently Java. He possesses an extensive grasp of cutting-edge technologies and boasts hands-on proficiency in Core Java, Spring Boot, Hibernate, Apache Kafka messaging queue, Redis, as well as relational databases such as MySQL and PostgreSQL. Adnan consistently delivers invaluable contributions to a variety of client projects, including Vision360 (UK) - Konfer, Bitsclan, Yogamu, Bill Barry DevOps support, enhedu.com, Noorisys, One Infinity- DevOps Setup, and more. He exhibits exceptional analytical skills alongside a creative mindset. Moreover, he possesses a fervent passion for reading books and exploring novel technologies and innovations.

Ready to innovate? Let's get in touch

Attach files

Recaptcha is required.

Backend

Full Stack

Frontend

Blockchain

Mobile

Video Streaming

E-commerce

ERP

CMS

Devops

AR/VR

Software Development Services

Metaverse Innovation & Consulting

Digital Experience

Digital Trivergence

Data Services

Scaffold

Company

Learn Image Text Recognition Using Google Cloud Vision API

Posted By : Mohd Adnan | 30-Apr-2018

Related Tags

About Author

Mohd Adnan

More From Oodles

Mastering Combine in iOS

A Beginner-to-Advanced Guide with Code Examples for Combine

Neha N | 14-Mar-2025

API Calls in Flutter

A crucial aspect of any modern app is its ability to fetch and send data over the internet. In this blog, we will explore different ways to make API calls in Flutter.

Abhishek Jha | 14-Mar-2025

Implementing File Downloads in iOS WKWebView Using WKDownloadDelegate

Learn how to enable file downloads in iOS apps by implementing the WKDownloadDelegate with WKWebView, ensuring a seamless user experience for handling various file types.

Sanan Husain | 05-Feb-2025

Ready to innovate? Let's get in touch

Valued Services

Resources

Expertise

Connect with us

© Copyright 2025 Oodles Technologies Pvt Ltd. All rights reserved.