HSA Foundation: for Show or for Real?

I recently spent a few days at AMD’s Fusion Developer Summit in Seattle, Washington.  Among many of the announcements was one to introduce the HSA Foundation, an organization  currently including AMD, ARM,  Imagination, MediaTek, and Texas Instruments.  The HSA Foundation was announced to “make it easy to program for parallel computing.”  That sounds a bit like an oxymoron as parallel programming has been the realm of “ninja programmers” according to Adobe’s Chief Software Architect, Tom Malloy at AMD’s event.  Given today’s parallel programming challenge, lots of work needs to be done to make this happen, and in the case of the companies above, it comes in the form of a foundation.  I spent over 20 years planning, developing, and marketing products and when you first hear the word “foundation” or “consortium” it conjures up visions of very long and bureaucratic meetings where little gets done and there is a lot of infighting.  The fact is, some foundations are like that but some are extremely effective   like the Linux Foundation. So which path will the HSA Foundation go down?  Let’s drill in.

The Parallel/GPU Challenge

The first thing I must point out is that if CPUs and GPUs keep increasing compute performance at their current pace, the GPU will continue to maintain a raw compute performance advantage over the CPU, so it is very important that the theoretical performance is turned into a real advantage.  The first thing we must do is distinguish is between serial and parallel processing.  Don’t take these as absolutes, as both CPUs and GPUs can both run serially and in parallel.  Generally speaking, CPUs do a better job on serial, out of order code, and GPUs do a better job on parallel, in-order code.   I know there are 100’s of dependencies but work with me here.  This is why GPUs do so much better on games and CPUs do so well on things like pattern matching. The reality is, few tasks just use the CPU and few just use the GPU; both are required to work together and at the same level to get the parallel processing gains.  By working at the same level I mean getting the same access to memory, unlike today where the CPU really dictates who gets what and when.  A related problem today is that coding for the GPU is very difficult, given the state of the languages and tools.  The other challenge is the numbers of programmers who can write GPU versus CPU code.  According to IDC, over 10M CPU coders exist compared to 100K GPU coders.  Adobe calls GPU coders  “ninja” developers because it is just so difficult, even with tools like OpenCL and CUDA given they are such low level languages.  That’s OK for markets like HPC (high performance computing) and workstations, but not for making tablet, phone and PC applications that could use development environments such as the Android SDK or even Apple’s XCode.  Net-net there are many challenges for a typical programmer to code an GPU-accelerated app for a phone, tablet, or a PC.

End User Problem/Opportunity

Without the need to solve an end user or business problem, any foundation is dead in the water.  Today NVIDIA  is using CUDA (C, C++, C#,), OpenCL, and OpenACC and AMD supports OpenCL to solve the most complex industrial workloads in existence.  As an example, NVIDIA simulated at their GTC developer conference what the galaxy would look like 3.8B years in the future.  Intel is using MIC, or Many Integrated Cores to tackle these huge tasks.  These technologies are for high-performance computing, not for phones, tablets or PCs. The HSA Foundation is focused on solving the next generation problems and uncovering opportunities in areas like the natural user interface with a multi-modal voice, touch and gesture inputs, bio-metric recognition for multi-modal security, augmented reality and managing all of the visual content at work and at home.  ARM also talked on-stage and in the Q&A about the power-savings they believed they could attain from a shared memory, parallel compute architecture, which surprised me.  Considering ARM powers almost 100% of today’s smartphones and tablets around the world, I want to highlight what they said.  Programming for these levels of apps at low power and enabling 100’s of thousands of programmers ultimately requires very simple tools which don’t exist today to create these apps.

The HSA Foundation Solution

The HSA Foundation goal, as stated above, was to “make it easy to program for parallel computing.” What does this mean?  The HSA Foundation will agree on hardware and software standards.  That’s unique in that most initiatives are just focused on the hardware or the software.  The goal of the foundation is to literally bend the hardware to fit the software.  On the hardware side this first means agreement on the hardware architectural definition of the shared memory architecture between CPU and GPU.  This is required for the CPU and GPU to be at the same level and not be restricted by buses today like PCI Express.  The second version of that memory specification can be found here.  The software architecture spec and the programmer reference manual are still in the working group.  Ultimately, simple development environments like the Google Android SDK, Apple’s XCode and Microsoft’s Visual Studio would need to holistically support this to get the support of the more mainstream, non-ninja programmer.  This will be a multi-year effort and will need to be measured on a quarterly basis to really see the progress the foundation is making.

Foundations are Tricky

The HSA Foundation will encounter issues every other foundation encounters at one time in its life.  First is the challenge of founding members changing their minds or getting goal-misaligned.  This happens a lot where someone who joins stops buying into the premise of the group or staunchly believes it isn’t valuable anymore.  Typically that member stops contributing but could even become a drag on the initiative and needs to be voted off.  The good news is that today, AMD, ARM, TI, MediaTek and Imagination have a need as they all need to accelerate parallel processing.  The founding members need to make this work for their future businesses to be as successful as they would like. Second challenge is the foundation is missing key players in GPUs.  NVIDIA is the discrete GPU PC and GPU-compute market share leader, Intel is the PC integrated GPU market share leader, and Qualcomm is the smartphone GPU market share leader.  How far can the HSA Foundation get without them?  This will ultimately be up to guys like Microsoft, Google and Apple with their development environments.  One wild-card here is SOC companies with standard ARM licenses.  To get agreement on a shared memory architecture, the CPU portion of ARM SOC would need to be HSA-compliant too, which means that every standard ARM license derived product would be HSA-compliant.  If you had an ARM architecture license like Qualcomm has then it wouldn’t need to be HSA-compliant.  The third challenge is speed.  Committees are guaranteed to be slower than a partnership between two companies and obviously slower than one company.  I will be looking for quarterly updates on specifications, standards and tools.

For Show or for Real?

The HSA Foundation is definitely for real and formed to make a real difference.  The hardware is planned to be literally bent to fit the software, and that’s unique.  The founding members have a business and technical need, solving the problem means solving huge end user and business problems so there is demand, and the problem will be difficult to solve without many companies agreeing on an approach.  I believe over time, the foundation will need to get partial or full support from Intel, NVIDIA, and/or Qualcomm to make this initiative as successful as it will need to be to accelerate the benefits of parallel processing on the GPU.