FYI.

This story is over 5 years old.

Tech

What High-Bandwidth Memory Is and Why You Should Care

The world demands massive parallel computation, but technology is just barely keeping up.
Image: Wiki

AMD, a name usually associated with lower-end computer processors, is more cutting-edge than you think.

The hardware world and even a bit beyond has been buzzing this week thanks to AMD's revelation that its long in-progress high-bandwidth memory (HBM) stacks are finally here—four years after the chip-maker abruptly canceled their initial planned release as part of a GPU dubbed Tiran. AMD's stacks couldn't come soon enough: memory is poised to be an epic, imminent bottleneck in advancing processing speeds, particularly when it comes to graphics (read: games). Memory needs to be faster, much faster. Now.

Advertisement

The processors of the future can be mindblowingly, blazingly quick and it won't make the slightest bit of difference if memory can't keep up. A CPU/GPU needs quick access to its addressable memory banks—RAM, generally—because, otherwise, speed is just a number. Computers need data to compute.

The situation is especially acute when it comes to GPUs, which "drink memory bandwidth like a big-block V8 drinks gas," in the words of PC World's Gordon Mah Ung.

First, a quick memory refresher. Computer processors, in the canonical sense, consist of circuits that perform simple mathematical operations on data supplied by either registers (a small set of super-immediate storage units holding the data that is at this very moment being computed) or addressable memory, which is basically a supply of memory cells that are loaned out to different programs and procedures in chunks. The memory bottleneck has to do with this second sort of memory—how small/densely-packed it is and how fast it can be accessed.

In a GPU, memory takes on a special role. Here, the central idea is manipulating as much memory as possible as quickly as possible; think about all of the continuous updating that has to occur pixel by pixel to render even a minor video game. As in a CPU, this is done in a constant cycle between the unit's memory banks and its processor cores—the difference is that rather than the handful of superfast cores performing computations in serial (more or less) in a CPU, a GPU operates with hundreds of slower cores all operating in parallel. In some part, this is taking advantage of the fact that a graphics application may have many different pieces of data all requiring the same computation (imagine a change in lighting on some scene) at the same time. This is the perfect problem for parallel computing and it even has a name: single instruction, multiple data (SIMD).

Advertisement

Enter high-bandwidth memory.

The current GPU memory standard-bearer is known as GDDR5, or double data rate type five synchronous graphics random access memory (phew). GDDR5 chips have a crucial limitation in how they connect to a GPU, which is via contacts around their edges. To add bandwidth, we add more GDDR5s around the perimeter of the GPU itself with the memory units lying flat side by side. The result is an outward sprawl of memory die and the need for longer and longer wires connecting everything, which is bad.

This is where we have our bottleneck: increasing distances between GPUs and RAM. HBM gets around this in a seemingly simple way by making the memory die stackable. Using tiny holes called through-silicon vias, or TSVs, HBM units allow for (theoretically) four times the RAM to be stacked around the GPU just by going vertical. As AMD Chief Technology Officer Joe Macri tells PC World, a GDDR5 chip will support a 32-bit-wide memory bus with up to 7GBps of bandwidth, while a single stack of HBM RAM supports a 1,024-bit-wide bus and more than 125GBps. HBM, according to Macri, is also more efficient in terms of power consumption, hitting 35GBps of memory bandwidth per watt consumed vs. GDDR5's 10.5GBps.

The implications are wider than graphics and gaming. Increasingly, high-performance computation tasks are being offloaded onto GPUs in what's known as GPU-accelerated computing. Initially a concept mostly applicable to engineering and scientific applications, GPU-outsourcing is becoming an important feature of mobile device programming as well. Separating GPUs from CPUs will make even less sense in the future, for most any sort of application.

Memory technology has to do much more than keep up nowadays, it has to be far ahead.