Watch How Computers Perform Optical Character Recognition

iStock // nikolay100
iStock // nikolay100 / iStock // nikolay100
facebooktwitterreddit

Optical Character Recognition (OCR) is the key technology in scanning books, signs, and all other real-world texts into digital form. OCR is all about identifying a picture of written language (or set of letters, numbers, glyphs, you name it) and sorting out what specific characters are in there.

OCR is a hard computer science problem, though you wouldn't know it from its current pervasive presence in consumer software. Today, you can point a smartphone at a document, or a sign in a national park, and instantly get a pretty accurate OCR read-out...and even a translation. It has taken decades of research to reach this point.

Beyond the obvious problems—telling a lowercase "L" apart from the number "1," for instance—there are deep problems associated with OCR. For one thing, the system needs to figure out what font is in use. For another, it needs to sort out what language the writing is in, as that will radically affect the set of characters it can expect to see together. This gets especially weird when a single photo contains multiple fonts and languages. Fortunately, computer scientists are awesome.

In this Computerphile video, Professor Steve Simske (University of Nottingham) walks us through some of the key computer science challenges involved with OCR, showing common solutions by drawing them out on paper. Tune in and learn how this impressive technology really works:

A somewhat related challenge, also featuring Simske, is "security printing" and "crazy text." Check out this Computerphile video examining those computer science problems, for another peek into how computers see (and generate) text and imagery.