OCR Images and PDFs using Google Docs
INTRO
This guide has been revived and updated following a website reshuffle when the original guide was completely lost! The basics have not changed since the original was written in February 2019, but the methods have been improved. Google Docs does a very good job of OCRing images with text, even those taken with a phone or tablets camera, and it can also extract the text from PDF files. This method does rely on an internet connection.
Example image of text to be OCR'd (this is the lorem.png image in the app)
Example of OCr'd text
The demo app shows how this is done, returning the text to a label, and allowing the user to save this text to a tinydb. A google apps script web app provides the magic of handling the uploaded image (in base64) opening this up in a google doc for OCRing, then returning the text. Both the image/pdf and the created google doc are deleted from Google Drive on success. The demo app tests two different types of image and a sample pdf containing text, and allows for the user to take photos or written text with the device camera for OCR. A standalone google apps script project is used for this demo. You can use a project bound to a spreadsheet if you find this easier, but the spreadsheet is not needed for this. The demo app was developed and tested in companion and compiled versions on Android 10 and 11, the file paths used reflect this. There is no block coding to handle earlier Android versions.
SETUP
What do we need?
Google Apps Script Web App
Setup a project in the usual way
Copy the script code below into the project
Enter your own folder ID in the correct place
Publish as a web app, executing as "you" and accessible by "anyone, even anonymous"
Get the script url for use in the App Inventor App
Make a note also of the filename you give to your project (you can't find it again using just the script url!). I have included mine in the demo app for safekeeping....
When creating the script, you will need to add the Advanced Drive Service. With the legacy editor you do this by going to Resources > Advanced Google Services, then setting "Drive API" to "On". With new script editor: Services>Drive API. Use Version 2 of the Drive API.
App Inventor App
The blocks and the demo app aia are provided below
The very basics required are as follows:
Camera component
Web component
In the PostText you must provide
the base64 encoded string of the file
the mimetype for the file (e.g. image/png)
a filename (e.g. image1.png)
Script URL
Sunny Gupta's Filey extension for base64 encoding
I have added more elements for usability:
A few procedures to extract the filename and the file extension
A hard coded procedure to create the mimetype required (there are only a few filetypes that can be used)
Use of Taifun's File extension to provide access to files in assets
Note the need to modify the Camera image path for use in FIley
A tinydb and listview to store and display the OCR texts
SCRIPT
// requires base64 encoded file, the file's mimetype, and a filename
function doPost(e) {
var data = Utilities.base64Decode(e.parameters.data);
var blob = Utilities.newBlob(data, e.parameters.mimetype, e.parameters.filename);
//provide here a folder ID for the creation of the image file
var fileID = DriveApp.getFolderById('your folder ID here').createFile(blob).getId();
if ( fileID !== "" ) {
try {
// Fetch the image from drive
var imageBlob = DriveApp.getFileById(fileID).getBlob();
var resource = { title: e.parameters.filename, mimeType: imageBlob.getContentType() };
// OCR on .jpg, .png, .gif, (or .pdf uploads)
var options = { ocr: true };
var docFile = Drive.Files.insert(resource, imageBlob, options);
var doc = DocumentApp.openById(docFile.id);
// Extract the text body of the Google Document
var text = doc.getBody().getText().replace("\n", "");
// Send the documents to trash
Drive.Files.remove(docFile.id);
Drive.Files.remove(fileID);
status = text;
} catch (error) {
status = "ERROR with OCR: " + error.toString();
}
} else {
status = "ERROR with OCR: No image specified";
}
return ContentService.createTextOutput(status);
}