FYI.

This story is over 5 years old.

Tech

Here's Why Emoji Can Break Apps

Emoji are all fun and games until they cause software problems.

Late last month, Laurie Stark, a growth editor at Upworthy, changed the nickname associated with her savings account to one that included an emoji. Her bank informed her that this broke their entire system. "They just called to let me know that they had to change my account name because it broke," Stark said on Twitter. (She declined to reveal the bank.)

I'm not saying it WAS me who texted this to @ajlobster but… it might have been me. https://t.co/35hnwmKJ3m
— Laurie Stark (@heylauriestark) May 24, 2016

Advertisement

About a year ago, after reporting on research about ways that hackers can track individuals using Bluetooth Low Energy profiles, I changed the name of my iPhone 5S to the nail polish emoji. I then tried to deposit a check into my bank account using the US Bank app on my iPhone, but it had stopped working. After spending time troubleshooting with tech support over the phone, an internet support specialist emailed me, asking me to ensure my device name (found on Settings->General->About on my iPhone) didn't have emoji. Once I changed it back to words, the app worked again. (US Bank's media representative did not respond to request for comment, but that same support specialist told me that the app was updated last fall, and can now read device names with emoji.)

We're not the only people who have had trouble. Anne Van Kesteren, who works on web standards for Mozilla, once renamed his Wi-Fi router network the turtle emoji. At the time he was using a smart kettle (before disconnecting it due to security concerns), but was unable connect because of the emoji name of the Wi-Fi network.

Banking tips pic.twitter.com/8Q90f056EZ
— Anna Marquardt (@ajlobster) May 24, 2016

So why can't some apps read emoji? It has to do with the representational systems used to represent characters in computers. "Back in ancient (1960s) times, we used pretty much two 8-bit representational systems: ASCII (American Standard Code for Information Interchange) and EBCDIC (Extended Binary Coded Decimal Interchange Code)," David Gewirtz, who teaches programming at UC Berkeley, said in an email. "These are both 8-bit coding systems, which means you can have a total of 256 character variations—and that has to handle characters for EVERYTHING." In addition, said Gewirtz, most ASCII characters were implemented in 7-bit, and the 8th bit was saved for things like diacritical marks such as ë or é. (While we can't use emoji without breaking apps, many people can't even enter their own names. In fact, I was unable to change my bank account nickname to a phrase with one of these letters.)

You can only store 128 characters in a 7-bit number, and that's not much: 52 of those are used for lowercase and uppercase letters. Numbers, punctuation, and other symbols would quickly fill up the additional 66. "That doesn't work for a lot of the Asian languages like Chinese or Japanese, that have at least 3000 characters or so in daily use and go up to like 60,000 if you include all the ones that are less often used," said Anne Van Kesteren.

This led to the birth of Unicode, which allows for more bits per character, and makes room for thousands more character (including emoji) represented as a single character. Unicode is more difficult to write than ASCII, though, and Gewirtz points out that it requires libraries that don't always work correctly, so things break.

Unicode isn't a cure-all, and sometimes can't render a stream of data to a correct symbol (and so uses the replacement symbol), in addition to potential security issues and attacks. But at least we can use emoji or non-ASCII symbols in our usernames.