OCR Images and PDFs using Google Docs

INTRO

This guide has been revived and updated following a website reshuffle when the original guide was completely lost! The basics have not changed since the original was written in February 2019, but the methods have been improved. Google Docs does a very good job of OCRing images with text, even those taken with a phone or tablets camera, and it can also extract the text from PDF files. This method does rely on an internet connection.

Example image of text to be OCR'd (this is the lorem.png image in the app)

Example of OCr'd text

The demo app shows how this is done, returning the text to a label, and allowing the user to save this text to a tinydb. A google apps script web app provides the magic of handling the uploaded image (in base64) opening this up in a google doc for OCRing, then returning the text. Both the image/pdf and the created google doc are deleted from Google Drive on success. The demo app tests two different types of image and a sample pdf containing text, and allows for the user to take photos or written text with the device camera for OCR. A standalone google apps script project is used for this demo. You can use a project bound to a spreadsheet if you find this easier, but the spreadsheet is not needed for this.  The demo app was developed and tested in companion and compiled versions on Android 10 and 11, the file paths used reflect this. There is no block coding to handle earlier Android versions.

SETUP

What do we need?

SCRIPT

// requires base64 encoded file, the file's mimetype, and a filename

function doPost(e) {

  var data = Utilities.base64Decode(e.parameters.data);

  var blob = Utilities.newBlob(data, e.parameters.mimetype, e.parameters.filename);

//provide here a folder ID for the creation of the image file

  var fileID = DriveApp.getFolderById('your folder ID here').createFile(blob).getId();

  

  if ( fileID !== "" ) {

  try {

// Fetch the image from drive

      var imageBlob = DriveApp.getFileById(fileID).getBlob();

      var resource = { title: e.parameters.filename, mimeType: imageBlob.getContentType() };

// OCR on .jpg, .png, .gif, (or .pdf uploads)

      var options = { ocr: true };

      var docFile = Drive.Files.insert(resource, imageBlob, options);

      var doc = DocumentApp.openById(docFile.id);

// Extract the text body of the Google Document

      var text = doc.getBody().getText().replace("\n", "");

// Send the documents to trash

      Drive.Files.remove(docFile.id);

      Drive.Files.remove(fileID);

      status = text;

    } catch (error) {

      status = "ERROR with OCR: " + error.toString();

    }

  } else {

    status = "ERROR with OCR: No image specified";

  }

  return ContentService.createTextOutput(status);

}

BLOCKS

VIDEO

RESOURCES

Credits @Taifun and @Sunny for their extensions