TeCoEd (Teaching Computing Education)
  • Home
    • Freelance
    • Book
  • Python
    • Learn Python >
      • Python Modules
    • Python Programs >
      • Higher or Lower
      • Magic Calculator
      • Password Checker
      • Python Pit
    • What's News App
    • Pixels to Cells
    • Python Mosaics
    • Python OCR
    • L-1-AM
    • Web Scraping >
      • Scraping Trains
    • Weather App
    • Snakes and Windows
    • Python Web Server >
      • Flask
    • Python Picks
  • Raspberry Pi
    • All About the Pi
    • Getting Started
    • Remote Desktop and VNC
    • Static IP Address
    • Sonic Pi >
      • 3.14
    • Twitter Feed >
      • Tweepy
    • Android & Pi >
      • Advanced Apps
      • Odds
    • A.I on the the Pi
    • CRON
    • Pick Your Own
  • Pi Hardware
    • Pi HATS >
      • Sense Hat Hacks
      • AstroPi HAT
      • Unicorn-HAT >
        • Unicorn Alphabet Disco
        • Uni Codes / Programs
      • Skywriter
      • Piano HAT
    • STS Pi
    • Pi Camera >
      • Pi-Cam, Python & Email >
        • Time Lapse
      • Pi Noir
    • Pipsta >
      • Flask, Input & Printers
    • Raspberry Pi Power >
      • Energenie IR power
    • Pibrella
    • Distance Sensor
    • LCD Screen
    • Pi-Tooth
    • Robot Arm
    • PiGlow
    • PiFM
    • Accelerometer
    • PiFace >
      • Installing PiFace >
        • Python Commands
  • Pi-Hacks
    • Drone Hacks
    • Pi Glue Gun Hack
    • Blinkt!
    • Sonic Pixels
    • R2D2
    • Get to the chopper
    • Astro Bird
    • Twitter Translator
    • Hacking a Robot
    • Nature_Box >
      • Best Nature Photos
    • Wearable Tech >
      • Project New York
      • P.N.Y Part 2 Health
      • P.N.Y Part 3 Games
      • P.N.Y Part 4 Translation
    • Dino-Tweet
    • Other Links
  • Pi-Hacks 2
    • Google Vision: Camera Tell
    • Yoda Tweets
    • Pi Phone
    • Darth Beats
    • Twitter Keyword Finder
    • Crimbo Lights Hack
    • Xmas Elf
    • Halloween 2016
    • Halloween Hack 2015
    • Socrative Zombie
    • Voice Translation
    • The Blue-Who Finder
    • GPIO, Twitter
    • Pi Chat Bot >
      • Dictionary Definitions
    • PiGlow & Email
    • Pibrella Alarm System
    • SMS with Python >
      • Spooking a Mobile
  • Minecraft
    • Minecraft API
    • Minecraft Sweeper
    • PiGlove: Minecraft Power Up
    • Minecraft Photo-booth
    • Rendering Pixels
    • Speed Cube
    • Lucky Dip
  • Computing
    • Why Computing?
    • Can You Compute
    • micro:bit
    • Coding Resources
    • Learn to Code >
      • Coding with iPads
      • Apps Creation Tools
      • sKratchInn
      • Sound Editing
    • Cheat Sheets
    • Theory
    • HOUR OF CODING
    • BEBRAS Computing Challange
    • Computer Facts
    • Free Software and Links
  • Contact Me
  • Downloading
  • Hologram Machine

What is it?


Using Python and a Raspberry Pi plus three lines of code you can hack a picture or image and scrap all the text into the console window.  The program uses OCR Optical Character Recognition, a technology that enables you to convert different types of documents, such as scanned paper documents, PDF files or images captured by a digital camera into editable and searchable data.  This code uses images in a range of file types, jpeg and png. 

1. Getting Started


This is a really easy hack which basically requires three lines of code and a couple of additional libraries.  Firstly update your Raspberry Pi:  

In the LX Terminal type:
sudo apt-get update
sudo apt-get upgrade

Then install Google's Tessaract OCR software by typing:
sudo apt-get install tesseract-ocr
Picture
(tesseract-ocr is a project Google have been working on full details are available here, it contains extra codes and developments) 
Next install the Python Wrapper for the Tesseract-OCR software -  this basically enables you to program the OCR using Python Code.

In the LX Terminal type using PIP:
sudo pip install pytesseract
Picture

The final part is to install the Python Imaging Library PIL
sudo apt-get install python-imaging
sudo apt-get install python-imaging-tk


Then reboot the Pi
sudo reboot

Three Line Hack



2. The Code


Now download or create an image which contains text, the two below worked very well.  I also tried a screen shot of a website and had about 70% success, there were some random characters and issues.

In the LX Terminal type
sudo idle

Open a new Python window and add the following code
import Image
import pytesseract
print pytesseract.image_to_string(Image.open('test.png'))


Where test is the name of the picture you which to scan, I tried jpg and png and they both worked well.  Save the program into the same folder as the pictures and hit F5 to run.  It really is that simple!
Picture
Some other Tesseract-OCR Resource Links:
  1. Google Tesseract-OCR
  2. Python Wrapper on GitHub
  3. Other Projects