The internet, like any public domain, be it physical or virtual, holds its own risks. In the case of the internet, those risks are plenty, and they come in myriad forms and shapes.
Today’s unsung hero, and the topic of our article, is the CAPTCHA program, and the multiple iterations of it that followed along the years. This little box shaped-service, while often eliciting emotions of frustrations at their sight, have been doing the internet’s good work of keeping websites clean.
Clean from what you may ask? From bots, obviously.
Safeguarding from spam
Ever since the early days of the internet, it has always been possible to program large amounts of bots with scripts to perform malicious activity. This could range from flooding a site with fake traffic, or spamming a submission form on a government site perhaps. The potential for abuse was limitless, and so there needed to be a way for IT designers and technicians to protect their clients’ online portals.
Enter the CAPTCHA, which stands for “Completely Automated Public Turing test to tell Computers and Humans Apart”), interestingly taking nominal inspiration from the original AI Turing test – with it itself a sort of anti-Turing test. While these bots were never as sophisticated as even a basic or pseudo-AI, they were still a heap of trouble for IT professionals.
Created between the years 1997 and 2000 by two teams in the United States, the program was first used on the now-defunct AltaVista search engine (purchased by Yahoo!), to protect from the thousands of spam inquiries it would receive per day. The term CAPTCHA was officially coined in 2003 in an academic paper by the creator team at Carnegie Mellon University.
Around that time, many online users would begin to stumble on these then-foreign, annoying boxes asking them to identify squiggly, distorted words to prove they’re human. While never intended to insult, it was quite annoying having to take what is essentially an eye examination just to be able to go about your business online. To make matters worse, as months and years passed, these blobs of text would only get even more distorted and difficult to read than before, as hackers and malicious actors would tweak their bots to become smarter.
To make matters worse, these spammers would even get humans to do their bidding, whether willingly (paying humans to solve CAPTCHAs) or outsourcing the CAPTCHA window to another site where an unknowing user would solve it believing it to be a standard security check, essentially becoming an unwitting accomplice of the hacker.
It is worthy of note that a somewhat similar practice such as this was utilized by benevolent companies like Google, for similarly benevolent goals. More on that in a second.
Naturally, with spammers upping their bot script game, the ethical programmers had to bolster their own CAPTCHA defenses, which led to new iterations of the software. Users now had to identify traffic lights, fire hydrants, and more to answer the ironically-existential question of “Are you human?”
Soon, however, CAPTCHA would be reborn as ReCAPTCHA, with extra functionality added in.
ReCAPTCHA would display 2 words beside each other, one easy to read and the other difficult. The easy one would have a correct answer stored in the system, while the difficult word was in fact unknown to the ReCAPTCHA program. The reason being is that the two words were scanned from books, with OCR software (Optical Character Recognition, used to scan printed text into digital text) successfully able to identify the easy word, while struggling with the more difficult one. When a human would solve the easy text, the system would assume their second answer is also correct (after tallying the answers of multiple users), having the human user essentially aid in the scanning effort of the hundreds of thousands of books for scanning into the Internet Archive. It was quite the ingenious effort by the School of Computer Science at Carnegie Mellon University to facilitate the process of book scanning.
ReCAPTCHA was acquired by Google in 2009.
The evolution of CAPTCHA
Today, you have likely to have stumbled on more basic CAPTCHAs, where you’re simply asked to tick a box. The software is able to analyze your mouse clicks, mouse click duration, mouse movement, cookies, and more to assign you a human score, by which you pass or fail. If you fail, you’re often asked to identify those detested “traffic lights.” The box-check ReCAPTCHAs became known as Version 2.
In 2018, Google introduced ReCAPTCHA v3, which is the most seamless, streamlined version of the software yet. This new version has become much more analytical, studying human users’ behaviors and cross-referencing it with the activity of bots, weeding them out.
For now, CAPTCHAs are here to stay. But, for how long?
Luis von Ahn, one of the co-creators of the software, believed that humans could probably beat machines for another 10 years. He made this statement in 2012.
“I’m certain it will happen at some point that computers are as good at this as humans,” he said. “At that point, we’ll have to figure something else out.”