During the opening keynote to Google I/O yesterday the company announced a new version of its Tensor Processing Unit, TPU 3.0. Though details were incredibly light, CEO Sundar Pichai claimed that TPU 3.0 would have “8x the performance” of the previous generation and that it was going to require liquid cooling to get to those performance levels. Immediately much of the technical media incorrectly asserted an 8x architectural jump without thinking through the implications or how Google might have come to those numbers.
For those that might not be up on the development, Google announced the TPU back in 2016 as an ASIC specifically targeting AI acceleration. Expectedly, this drew a lot of attention from all corners of the field as it marked not only one of the first custom AI accelerator designs, but it was also from one of the biggest names computing. The Tensor Processing Unit targets TensorFlow, a library set for machine learning and deep neural networks developed by Google. Unlike other AI training hardware, that does limit the use case for TPU to customers of Google Cloud products and only TensorFlow based applications.
They are proprietary chips and are not available for external purchase. Just a few months ago, it leaked from the New York Times that Google would begin offering access to TPUs through Google Cloud services. But Google has no shortage of use cases for internal AI processing that TPUs can address from Google Photos to Assistant to Maps.
Looking back to the TPU 3.0 announcement yesterday, there are some interesting caveats about the claims and statements Google made. First, the crowd cheered when it heard this setup was going to require liquid cooling. In reality, this means that there has been a dramatic reduction in efficiency with the third-generation chip OR they are being packed much more tightly in these servers without room for traditional cooling.
Efficiency drops could mean that Google is pushing the clock speed up on the silicon, ahead of the optimal efficiency curve to get that extra frequency. This is a common tactic in ASIC designs to stretch out performance of existing manufacturing processes or close the gap with competing hardware solutions.
Liquid cooling in enterprise environments isn’t unheard of, but it is less reliable and costly to integrate.
The extremely exciting performance claims should be tempered somewhat as well. Though the 8x improvement and statement of 100 PetaFLOPS of performance are impressive, it doesn’t tell us the whole story. Google was quoting numbers from a “pod”, the term the company uses for a combination of TPU chips and supporting hardware that consume considerable physical space.
TPU 2.0 pods combined 256 chips but for TPU 3.0 it appears Google is collecting 512 into a single unit. Besides the physical size increases that go along with that, this means relative performance for each chip of TPU 3.0 versus TPU 2.0 is about 2x. That’s a sizeable jump, but not unexpected in the ever-changing world of AI algorithms and custom acceleration. There is likely some combination of clock speed and architectural improvement that equate to this doubling of per-chip performance, though with that liquid cooling requirement I lean more towards clock speed jumps.
Google has not yet shared architectural information about TPU 3.0 and how it has changed from the previous generation. Availability for TPU 3.0 unknown but even Cloud TPU (using TPU 2.0) isn’t targeted until the end of 2018.
Google’s development in AI acceleration is certainly interesting and will continue to push the industry forward in key ways. You can see that exemplified with NVIDIA’s integration of TensorCores in its Volta GPU architecture last year. But before the market gets up in arms thinking Google is now leading the hardware race, its important to put yesterday’s announcement in the right context.