The Surprisingly Devious History of CAPTCHA

Life in the Information Age changes so fast and so often that we often don’t even notice. Take, for example, the CAPTCHA system of internet user authentication, which became ubiquitous, then kind of sinister, then began to fade away.

The word CAPTCHA is an acronym for “Completely Automated Public Turing test to tell Computers and Humans Apart.” The original system was developed in the early 2000s by engineers at Carnegie Mellon University. The team, led by Luis von Ahn (who calls himself "Big Lou"), wanted to find a way to filter out the overwhelming armies of spambots pretending to be people.

They devised a program that would display some form of garbled, warped, or otherwise distorted text that a computer couldn’t possibly read, but a human could make out. All a user had to do was type the text in a box, and access was theirs.

The program was wildly successful. CAPTCHA became a ubiquitous tool and an accepted part of the internet user experience.

Unfortunately, the designers overlooked one very human trait: a need to get paid. Before too long, spam-sponsored CAPTCHA farms were popping up all over the internet, especially in poor countries, offering workers money to solve CAPTCHA boxes by the thousands.

Even with these spam farms, CAPTCHA was a solid product. But the engineers weren’t satisfied. Millions of people were voluntarily translating nonsensical images into text, which seemed, to von Ahn, like a waste of perfectly good free labor.

Speaking to The New York Times in 2011, von Ahn remembered thinking, “’Can we do something useful with this time?”

After some more tinkering, reCAPTCHA was born and implemented on sites all over the internet. The general user experience was pretty much the same: type the letters and numbers you see onscreen. But rather than randomized words, reCAPTCHA asked users to translate images of real words and numbers taken from archival texts. Computers are pretty good at reading old documents, but smeary ink and damaged paper may make some words hard to read. Fortunately for von Ahn, humans can still read those words just fine.

They started with the archives of The New York Times, then sold the technology to Google, who began using it to transcribe old books. That’s right—you have likely worked for free for Google and The New York Times. Those grainy images of old-timey text are real words from real pages.

Von Ahn was pleased with the new version and confident that reCAPTCHA was here to stay. “We’ll be going for a long time,” he told the Times. “There’s a lot of printed material out there.”

But, as we said, this is the Internet Age. Most of the programs and online behaviors that we take for granted today will be extinct in a few years, and the CAPTCHA dynasty is no exception.

In 2014, a Google analysis found that artificial intelligence could crack even the most complex CAPTCHA and reCAPTCHA images with 99.8 percent accuracy, rendering the programs useless as security devices.

In their place, Google unveiled the now-familiar “No CAPTCHA reCAPTCHA” system, which relies not on a users’ ability to decipher text, but on their online behavior prior to the security checkpoint. While a user is on a page, an invisible algorithm is monitoring how they interact with the content to determine if they’re human or robot.

Then, at the checkpoint itself, users are asked to confirm a single statement: “I am not a robot.”

If the program believes you’re a human, all you have to do is check the box and move on. If you’re suspected of spambot tendencies, checking the box will open up a new challenge, like identifying all the kittens in a photo array.

The arms race between internet security experts and spambots may never end. In time, No CAPTCHA reCAPTCHA will be outsmarted, then replaced. And when that happens, pay attention.