The Internet Has a Huge C/C++ Problem and Developers Don't Want to Deal With It
What do Heartbleed, WannaCry, and million dollar iPhone bugs have in common?
The Weakest Link is Motherboard's third, annual theme week dedicated to the future of hacking and cybersecurity. Follow along here.
Listen to Motherboard’s new hacking podcast, CYBER, here.
Alex is a software security engineer at Mozilla, where he works on sandboxing and anti-exploitation for Firefox. Previously he was a software engineer with the United States Digital Service, and served as a member of the board of directors of both the Python and Django Software Foundations.
One bug affects iPhones, another affects Windows, and the third affects servers running Linux. At first glance these might seem unrelated, but in reality all three were made possible because the software that was being exploited was written in programming languages which allow a category of errors called "memory unsafety." By allowing these types of vulnerabilities, languages such as C and C++ have facilitated a nearly unending stream of critical computer security vulnerabilities for years.
Imagine you had a program with a list of 10 numbers. What should happen if you asked the list for its 11th element? Most of us would say an error of some sort should occur, and in a memory safe programming language (for example, Python or Java) that's what would happen. In a memory unsafe programming language, it'll look at wherever in memory the 11th element would be (if it existed) and try to access it. Sometimes this will result in a crash, but in many cases you get whatever happens to be at that location in memory, even if that portion of memory has nothing to do with our list. This type of vulnerability is called a “buffer-overflow,” and it's one of the most common types of memory unsafety vulnerabilities. HeartBleed, which impacted 17 percent of the secure web servers on the internet, was a buffer-overflow exploit, letting you read 60 kilobytes past the end of a list, including passwords and other users’ data.
There are other types of memory unsafety vulnerabilities with C/C++, though.
Other examples are “type confusion,” (mixing up what type of value exists at a place in memory) “use after free,” (using a piece of memory after you told the operating system you were done with it) and “use of uninitialized memory” (using a piece of memory before you’ve stored anything in it). Together, these form some of the most common vulnerabilities across widely used software such as Firefox, Chrome, Windows, Android, and iOS. I've been tracking the security advisories for these projects for more than a year and in almost every release for these products, more than half of the vulnerabilities are memory unsafety. More disturbingly, the high and critical severity vulnerabilities (generally those which can result in remote code execution, where an attacker can run any code they want on your computer; this is usually the most severe type of vulnerability) are almost always memory unsafety. From my own security research into the widely used open source image processing libraries ImageMagick and GraphicsMagic, in the last year I've found more than 400 memory unsafety vulnerabilities.
If these vulnerabilities are so prevalent, can cause so much damage, and there are languages that don't have these pitfalls, then why are these languages still so common? First, while there are now good choices for languages that prevent memory unsafety vulnerabilities, this wasn’t always the case. C and C++ are decades-old and enormously popular languages, while memory-safe languages that are usable for low-level programming like web browsers and operating systems, such as Rust and Swift, are only just starting to achieve popularity.
A bigger issue is that when developers sit down to choose a programming language for a new project, they're generally making their decision based on what languages their team knows, performance, and ecosystem of libraries that can be leveraged. Security is almost never a core consideration. This means languages which emphasize security, at the cost of ease of use, are at a disadvantage.
Furthermore, many of the most important software projects for internet security are not new, they were started a decade or more ago, for example Linux, OpenSSL, and the Apache webserver are all more than twenty years old. For massive projects like these, simply rewriting everything in a new language isn't an option; they need to be incrementally migrated. This means that projects will need to be written in two languages, instead of one, which increases complexity. It can also mean needing to retrain a huge team, which takes time and money.
Finally, the largest problem is that many developers don't believe there's a problem at all. Many software engineers believe the problem is not that languages like C/C++ facilitate these vulnerabilities, it's that other engineers write buggy code. According to this theory, the problem isn't that trying to get the 11th item in a 10 item list can result in a critical vulnerability, the problem is that someone wrote code which tries to get the 11th item in the first place, that they either weren't a good enough engineer or weren't disciplined enough. In other words, some people think that the problem isn't with the programming language itself, only that some people don't know how to use it well.
One of the criteria when choosing a programming language should be "How will this choice impact security?"
Many developers find this position compelling, despite the mountain of competing evidence—these vulnerabilities are omnipresent, and effect even companies with the largest security budgets and the most talented developers! It's one thing to discuss the tradeoffs and how we can make memory-safe languages easier to learn, but after thousands upon thousands of vulnerabilities that are preventable with a better programming language, the evidence makes it clear that "try harder not to have bugs" is not a viable strategy.
However, there is some good news. Not everyone is in denial about this problem. Rust (disclosure: Rust’s primary sponsor is my employer, Mozilla) is a relatively new programming language which aims to be usable for every problem C and C++ are used for, while being memory safe and thus avoiding these security pitfalls. Rust is gaining adoption, it’s now used by Mozilla, Google, Dropbox, and Facebook, and I believe this demonstrates that many people are starting to look for systemic solutions to the memory unsafety problem. Further, Apple’s Swift programming language is also memory safe, while its predecessor, Objective-C, was not.
There are a number of things we can do to accelerate the search for a comprehensive solution to the ongoing security disaster that is memory unsafety. First, we can get better at quantifying how much damage memory safety causes. While I've been performing rudimentary statistics for a few projects, there is an opportunity for more rigorous tracking. The CVE project, an industry-wide database of known vulnerabilities, could track for every vulnerability whether it was a memory unsafety issue, and whether a memory safe language would have prevented it. This will help us answer questions like, "Which projects would benefit most from a programming language that is memory safe?"
Second, we should invest in research into how to best migrate existing large software projects to memory safe languages. Currently the idea of migrating something like the Linux kernel to a different programming language is a task almost too large to imagine. Dedicated research into what sort of tools could facilitate this, or how programming languages could be designed to make it easier, would dramatically reduce the cost of improving older, larger, projects.
Finally, we can shift the culture around security within software engineering. When I first learned C++ in college, it was expected that sometimes your program would crash. No one ever told me that many of those crashes were also potential security vulnerabilities. A lack of awareness about the connection between the bugs and crashes developers encounter and security issues, from early on in a developer’s career, is emblematic of how security is a secondary concern in software engineering and how we teach it. When creating new projects it should be accepted that one of the criteria when choosing a programming language should be "How will this choice impact security?"
Memory unsafety is currently a scourge for our industry. But it doesn't have to be the case that every Windows or Firefox release fixes dozens of avoidable security vulnerabilities. We need to shift ourselves from treating each memory unsafety vulnerability as an isolated incident, and instead treat them as the deeply rooted systemic problem they are. And then we need to invest in engineering research into how we can build better tools to solve this problem. If we make that change and that investment we can make a dramatic improvement to computer security for all users, and make HeartBleed, WannaCry, and million dollar iPhone bugs far less common.