Imagine a room full of 1,000 computers, but at nanoscale.
Engineers at the University of California at Davis have built the world's first "KiloCore" chip. Featuring 1,000 independently programmable processors, the chip, which was presented this week at the 2016 Symposium on VLSI Technology and Circuits, is capable of 1.78 trillion instructions per second and contains 621 million transistors. The partially Department of Defense-funded KiloCore chip was ultimately built by IBM using existing 32 nanometer semiconductor fabrication technology.
Unfortunately, a 1,000 core chip isn't something that could just be plugged into the next line of MacBook Pros. It wouldn't even really suffice as a graphics processor, where massively parallel computation is the norm. In fact, many GPUs exceed the 1,000 cores of the UC Davis chip, but with the caveat that the individual cores are directed according to a central controller. The KiloCore, by contrast, is built from completely independent cores capable of running completely independent computer programs.
The independence of the cores makes the KiloCore chip a multiple instruction, multiple data (MIMD) computer. This is in contrast to the more typical single instruction, multiple data (SIMD) variety of parallel computation, as would be expected in a graphics processor. A SIMD machine's version of parallelism is to implement the same single operation across many different cores—that is, do the same thing to many different units of data. This is the norm in image processing, for example, where a lot of different pixels holding different a lot of different values are all updated in the same way. A MIMD machine can be expected to do much more complex calculations.
"Perhaps thinking of the chip as a room filled with 1000 computers is helpful, though a stretch," Bevan Baas, a computer engineering professor and leader of the UC Davis team that developed the chip, told me.
According to the researchers, the KiloCore chip is the most energy efficient "many core" processor every reported. The cores each max out at around 1.78 GHz, and, because they are all independently clocked, can be shut down individually when not in use. Together, the 1,000 processors can execute 115 billion instructions per second while dissipating only 0.7 Watts. As noted in a UC Davis press release, this power requirement is low enough that it could be supplied by a single AA battery, achieving an efficiency of around 100 times that of a normal laptop processor.
The energy savings here largely has to do with the abandoning of the traditional system memory architecture, in which data for multiple cores is stored in a central RAM unit. Rather than sharing data in this way, the KiloCore chip uses a built-in networking scheme in which data is transferred directly between the different processors using packet- and circuit-switched networking.
"The cores do not utilize explicit hardware caches and they operate more like autonomous computers that pass information by messages rather than a shared-memory approach with caches," Baas explained. "From the chip level point of view, the shared memories are like storage nodes on the network that can be used to store data or instructions and in fact can be used in conjunction with a core so it can execute a much larger program than what fits inside a single core."
The UC Davis group has already developed applications for the KiloCore including wireless coding/decoding, video processing, and encryption. It's well-suited for problems involving large amounts of parallel data, such as in scientific data applications and datacenter record processing. But don't expect it on shelves near you anytime soon.
"I can't say much because it hasn't been published, but there is a follow-on design whose results we hope to publish next year," Baas told me. "Plus, another design is planned after that which will definitely be fabricated. No plans for a commercialized version currently—however I do keep in close contact with a number of companies.