Categories

[Linux] How to “read” your images, OCR with python

You are here:
  • Main
  • Linux
  • [Linux] How to "read" your images, OCR with python
< All Topics

Hello everybody,

today I’m going to talk about how you can extract text from images in python, with the help of Tesseract OCR. All the commands are written for this environment:

  • Python 3.8.3
  • Ubuntu 20.04.1 LTS (Focal Fossa)

Let’s get started.

Installing the software

First of all, we need to install tesseract and all its dependecies:

$ sudo apt install tesseract-ocr
$ sudo apt install libtesseract-dev
$ sudo apt install tesseract-ocr-ita 

So, firstly we install the software, then we install the library, lastly we install a language pack to improve OCR performance.

After this, we install some Python modules to use tesseract easily and for converting images:

$ pip install pytesseract
$ pip install opencv

Essentially, with pytesseract you can use tesseract in a very easy way, instead with opencv you can convert the image for improving OCR performance.

Sample Code and Example

Here is a sample code with image manipulation and translation of the text:

import cv2 
import argparse 
import os 
import pytesseract 
from PIL import Image 

ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image",
    help="path to input image to be OCR'd")
args = vars(ap.parse_args())


def extract_text(image):
    im = cv2.imread(image) 
    imgray = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY) 
    ret, thresh1 = cv2.threshold(imgray, 180, 255, cv2.THRESH_BINARY) 
    filename = "{}.png".format(os.getpid()) 
    cv2.imwrite(filename, thresh1) 
    img = Image.open(filename)
    text = pytesseract.image_to_string(img)
    return text

def main():
    text = extract_text(args["image"])
    print(text)

if __name__ == "__main__":
    main()

Here is the output with a sample image:

As you can see, firstly we “translate” the image in a scale of gray, then we “remove” the higlights applying a threshold.

Final Thoughts

That’s it, OCR is really helpful if you have a lot of scanned paper documents, for example in order to “translate” a book from paper to digital!

You can use a lot of different thresholds and other image manipulation tools for improving performance.

Regards

Vito

Table of Contents