How Taco Emoji—and Hittite Hieroglyphs—Get to Your Screen
Remember when there was all that internet racket about the taco emoji? The trending hashtag, the t-shirts, the campaigns—it was a global initiative, and it worked. And we were happy, for a while ... at least until we realized that there was no eyeroll emoji.
Meanwhile, the group we were haranguing for more small pictures was busy with quite a few other projects. Like saving entire languages from extinction, dealing with multi-governmental conflicts, and creating the code that allows you to read this right now.
Unicode is a universal coding system, created and maintained by the Unicode Consortium.
It’s the reason you can email someone in China and they can read it, or I can text my Dad’s Android from iOS and he can yell-text me back in all caps because he doesn’t know how not to (not Unicode’s fault), or it's how I can build a web page and Google can find that page for someone else, no matter where they are—even someone in space using rocketship software. It’s a universal character language that all computers speak. The digital tower of Babel.
The Unicode Consortium website, a derelict relic of the '90s (here on out referred to as the Hidden Temple) puts it like this: “Fundamentally, computers just deal with numbers. They store letters and other characters by assigning a number for each one. Before Unicode was invented, there were hundreds of different encoding systems for assigning these numbers.”
No single one of these systems could contain enough characters to cover all written languages, and some of them needed multiple encoding systems just for a single language with all its letters, punctuation, and symbols.
Not only were these systems inadequate on their own, they also conflicted with one another. For instance, two encodings could use different numbers for the letter M—or worse, they could use the same number for two different letters.
So if you emailed “I love you” to someone, but they were using a program with a conflicting encoding system, they might get a message that looks more like “◻◻◻◻◻.” Relationship over. Or at very least in the grey area.
Unicode saved your relationship by providing a unique number for every character, no matter what the platform, no matter what the program, no matter what the language. So “I love you” shows up in the exact script in which it was sent.
Today, the Unicode Consortium is a non-profit corporation managed by a group of directors and led by president Mark Davis, who also works at Google. It is funded by its various members and interested parties, ranging from Microsoft and Apple to the Indian Government and UC Berkeley.
The entire organization functions out of a one-room office at the Microsoft campus in Mountain View, and only the office manager works there—and even then, only part time. Most of the technical work is distributed and done online by volunteers, with most big decisions being made at the Unicode Technical Committee’s quarterly in-person meetings. There are people in the world who choose to attend these in their free time.
In the last 25 or so years, the Unicode Consortium has encoded more than 120,000 characters and 129 scripts (which support a lot more than 129 writing systems), including languages that have long been dead, like Hittite hieroglyphs from thousands of years ago. Encoding these languages ensures that they'll survive in a digital future—and Unicode has basically digitally archived the entire history of writing in an interchangeable way.
But the taco emoji is what finally got them in the news.
The original Unicode-encoded emoji came out in 2010, but most of the public didn’t realize this until the time Unicode 7.0 was released in 2014. That’s when iOS and Android were coming online with emoji, making the dancing lady in the red dress available to all and feeding the public’s outcry for more more more.
Next year there is another large set of emoji coming—you can get a tentative list here. It includes the “call me” hand, so you can simulate making a call circa 2001.
But the biggest addition to Unicode will be Tangut, a historical writing system from China and another large collection of Chinese characters. They’ll also be adding a few minority script additions, including Adlam from West Africa, Osage—a U.S. Native American language—and Newa of Nepal, the encoding of which has helped to legitimize a population’s native tongue and reveal a still fraught relationship between a government and one of its minority groups.
But the emoji will get all the press.
For more information on Unicode, check out this article on Medium.